Release date 2026/04/07
Deprecation
-
Removed the
parse-requestshared filter and deprecated thellm_formatconfiguration field.
Feature
-
Added a policy-based rate limiting mode with multi-dimensional match conditions and request counting strategy.
-
Added prompt token estimation for
prompt_tokens,total_tokens, andcoststrategies in policies mode, enabling early request rejection before the upstream responds.
Bugfix
-
Fixed an issue where the plugin did not validate whether the configured
dictionary_nameexists before use, which could lead to runtime errors. -
Extended deferred rate limiting callback to support model-partitioned policies and cost strategy.