AI Proxy

Overview Examples Guides Configuration reference Changelog

3.13.0.0

Release date 2025/12/18

Feature

Added support for Cerebras - a new AI Provider.
Added support for HuggingFace’s new serverless API in the AI Proxy plugin, enabling seamless integration and improved compatibility.
Added support for native format video generation in Bedrock and Gemini LLM drivers, allowing direct passthrough of provider-specific video generation requests without format conversion.
Added support for batch API of bedrock.
Introduced Alibaba Dashscope as new provider.
Added support for batch mode of anthropic.
Added support for inline batch mode of gemini

Bugfix

Added dimension configuration support for Amazon Titan Embed v2 models.
Fixed an issue where cohere embedding model on Bedrock returned a bad request error
Fixed an issue where tls_certificate_verify configuration was not respected in JSON-RPC and AWS STS calls.
Added tool_calls passthrough and preserved finish_reason in Huggingface driver responses.
Fixed an issue where HuggingFace embedding driver were incorrectly parsed responses from embedding API
Fixed an issue where extra inputs were not permitted for huggingface inference provider
Fixed an issue where using ai-proxy-advanced in conjunction with a logging plugin (such as file-log) resulting in missing information on the last entry of the balancer “tries” section.

3.12.0.2

Release date 2025/12/10

Bugfix

Fixed an issue where cohere embedding model on Bedrock returned a bad request error

3.12.0.1

Release date 2025/11/18

Bugfix

Added dimension configuration support for Amazon Titan Embed v2 models.
Fixed integration with HuggingFace by adding support for the new serverless API, ensuring seamless compatibility.
Fixed an issue where HuggingFace embedding driver were incorrectly parsed responses from embedding API
Fixed an issue where extra inputs were not permitted for huggingface inference provider

3.12.0.0

Release date 2025/10/01

Feature

Added time_to_first_token and request_mode for observability.

Bugfix

Fixed an issue where the Anthropic provider mapped the wrong stop_reason to Chat and Completions responses.
Fixed an issue where the Anthropic provider failed to stream function call responses.
Fixed an issue where the Anthropic provider could truncate tokens in streaming responses.
Fixed an issue where OpenAI chat completion’s tool_choice was not converted to Anthropic’s.
Fixed an issue where the model field was incorrect in Gemini responses when a variable was used as the model name.
Fixed an issue where the model field was missing in OpenAI-format Gemini responses in some situations.
Fixed an issue where the Gemini provider did not properly handle multiple tool calls across different turns.
Fixed an issue where an empty array was encoded as an empty object in structured output.
Fixed an issue where AWS stream parser didn’t parse correctly when the frame was incomplete.
Fixed an issue where JSON array data wasn’t correctly parsed.
Fixed an issue with Azure Managed Identity credentials causing improper request balancing.
Fixed an issue where AWS Bedrock did not properly handle image editing requests through the image/v1/images/edits route.
Fixed an issue where AWS Bedrock streaming requests would panic when the ai-semantic-cache plugin was enabled.
Fixed an issue where the resource was not being passed correctly when using Azure as the provider.

3.11.0.7

Release date 2026/02/26

Bugfix

Added dimension configuration support for Amazon Titan Embed v2 models.
Fixed an issue where using ai-proxy-advanced in conjunction with a logging plugin (such as file-log) resulting in missing information on the last entry of the balancer “tries” section.

3.11.0.6

Release date 2025/11/03

Bugfix

Fixed an issue where HuggingFace embedding driver were incorrectly parsed responses from embedding API

3.11.0.5

Release date 2025/10/23

Bugfix

Fixed an issue where OpenAI chat completion’s tool_choice was not converted to Anthropic’s.
Fixed an issue where the model field was incorrect in Gemini responses when a variable was used as the model name.
Fixed an issue where model field was missing in OpenAI format response from Gemini provider in some situations.
Fixed integration with HuggingFace by adding support for the new serverless API, ensuring seamless compatibility.
Fixed an issue where extra inputs were not permitted for huggingface inference provider

3.11.0.4

Release date 2025/09/30

Bugfix

Fixed an issue when empty array in structured output was encoded as empty object.

3.11.0.3

Release date 2025/09/05

Bugfix

Fixed an issue where the Anthropic provider mapped the wrong stop_reason to Chat and Completions responses.
Fixed an issue where JSON array data is not correctly parsed.
Fixed an issue with Azure Managed Identity credentials causing improper request balancing.
Fixed an issue where AWS Bedrock did not properly handle image editing requests through the image/v1/images/edits route.
Fix panic in LLM observability when populating token usage for AI Proxy token usage details

3.11.0.2

Release date 2025/07/28

Bugfix

Fixed an issue where aws stream parser didn’t parse correctly when the frame was incomplete.
Fixed an issue where AWS Bedrock streaming requests would panic when the ai-semantic-cache plugin was enabled.
Fixed an issue where AI Proxy and AI Proxy Advanced can’t properly failover to a Bedrock target.
Fixed an issue where AI Proxy and AI Proxy Advanced might produce duplicate content in the response when the SSE event was truncated.
Fixed an issue where AI Proxy and AI Proxy Advanced might drop content in the response when the SSE event was truncated.
Fixed an issue where resource was not being passed correctly when using Azure as provider.

3.11.0.1

Release date 2025/07/16

Bugfix

Fixed an issue where the llm license migration could fail if the license counter contained more than one week of data.

3.11.0.0

Release date 2025/07/03

Deprecation

ai-proxy, ai-proxy-advanced: Deprecated the preserve route_type. You are encouraged to use new route_types added in version 3.11.x.x and onwards.

Bugfix

Fixed an issue where large request payload was not logged.
Fixed an issue where latency metric was not implemented for streaming responses.
Fixed an issue where SSE terminator may not have correct ending characters.
Fixed an issue where the response body for observability may be larger than the real one because of the stale data.
Fixed an issue where some of ai metrics was missed in analytics
Fixed an issue where patterns using multiple capture groups (e.g., $(group1)/$(group2)) failed to extract expected matches.
Fixed an issue where model name did not escape when using AWS Bedrock inference profile.
Fixed an issue where AI Proxy and AI Proxy Advanced would use corrupted plugin config.
Fixed an issue where AI Proxy returns 403 using gemini provider.

Performance

Implemented a faster SSE parser.

Known Issues

If any AI Gateway plugin has been enabled in a self-managed Kong Gateway deployment for more than a week, upgrades from 3.10 versions to 3.11.0.0 will fail due to a license migration issue. This does not affect Konnect deployments.

A fix will be provided in 3.11.0.1.

See breaking changes in 3.11 for a temporary workaround.

3.10.0.7

Release date 2025/12/09

Bugfix

Fixed integration with HuggingFace by adding support for the new serverless API, ensuring seamless compatibility.
Fixed an issue where HuggingFace embedding driver were incorrectly parsed responses from embedding API
Fixed an issue where extra inputs were not permitted for huggingface inference provider

3.10.0.6

Release date 2025/10/10

Bugfix

Fixed an issue where OpenAI chat completion’s tool_choice was not converted to Anthropic’s.
Fixed an issue where JSON array data is not correctly parsed.
Fixed an issue with Azure Managed Identity credentials causing improper request balancing.

3.10.0.4

Release date 2025/08/07

Bugfix

Fixed an issue where aws stream parser didn’t parse correctly when the frame was incomplete.
Fixed an issue where AWS Bedrock streaming requests would panic when the ai-semantic-cache plugin was enabled.
Fixed an issue where model name did not escape when using AWS Bedrock inference profile.
Fixed an issue where AI Proxy and AI Proxy Advanced can’t properly failover to a Bedrock target.
Fixed an issue where resource was not being passed correctly when using Azure as provider.

3.10.0.3

Release date 2025/07/06

Bugfix

Fixed an issue where large request payload was not logged.
Fixed an issue where SSE terminator may not have correct ending characters.
Fixed an issue where the response body for observability may be larger than the real one because of the stale data.
Fixed an issue where patterns using multiple capture groups (e.g., $(group1)/$(group2)) failed to extract expected matches.
Fixed an issue where AI Proxy returns 403 using gemini provider.

3.10.0.1

Release date 2025/04/15

Bugfix

Fixed an issue where AI Proxy and AI Proxy Advanced would use corrupted plugin config.

3.10.0.0

Release date 2025/03/27

Breaking Change

Changed the serialized log key of AI metrics from ai.ai-proxy to ai.proxy to avoid conflicts with metrics generated from plugins other than AI Proxy and AI Proxy Advanced. If you are using logging plugins (for example, File Log, HTTP Log, etc.), you will have to update metrics pipeline configurations to reflect this change.

Deprecation

Deprecated preserve mode in config.route_type. Use config.llm_format instead. The preserve mode setting will be removed in a future release.

Feature

Added support for boto3 SDKs for Bedrock provider, and for Google GenAI SDKs for Gemini provider.
Allow authentication to Bedrock services with assume roles in AWS.

Bugfix

Fixed a bug in the Azure provider where model.options.upstream_path overrides would always return a 404 response.
Fixed a bug where Azure streaming responses would be missing individual tokens.
Fixed a bug where response streaming in Gemini and Bedrock providers was returning whole chat responses in one chunk.
Fixed a bug with the Gemini provider, where multimodal requests (in OpenAI format) would not transform properly.
Fixed an issue where AI upstream URL trailing would be empty.
Fixed an issue where Gemini streaming responses were getting truncated and/or missing tokens.
Fixed an incorrect error thrown when trying to log streaming responses.
- Fixed an issue where templates weren’t being resolved correctly.
- The plugins now support nested fields.
Fixed tool calls not working in streaming mode for Bedrock and Gemini providers.
Fixed preserve mode.

Known Issues

Some active tracing latency values are incorrectly reported as having zero length when using the AI Proxy plugin.

3.9.1.2

Release date 2025/07/07

Bugfix

Fixed an issue where AI Proxy and AI Proxy Advanced would use corrupted plugin config.

3.9.1.1

Release date 2025/03/20

Bugfix

Fixed issue of template not being resolved correctly and supported nested fields.
Fixed preserve mode.

3.9.1.0

Release date 2025/03/11

Bugfix

Fixed Gemini streaming responses getting truncated and/or missing tokens.
Fixed incorrect error thrown when trying to log streaming responses.
Fixed tool calls not working in streaming mode for Bedrock and Gemini providers.

3.9.0.1

Release date 2025/01/28

Bugfix

Fixed a bug in the Azure provider where model.options.upstream_path overrides would always return 404.
Fixed a bug where Azure streaming responses would be missing individual tokens.
Fixed a bug where response streaming in Gemini and Bedrock providers was returning whole chat responses in one chunk.
Fixed a bug where multimodal requests (in OpenAI format) would not transform properly, when using the Gemini provider.
Reverted the analytics container key from “proxy” to “ai-proxy” to align with previous versions.

3.9.0.0

Release date 2024/12/12

Feature

Disabled HTTP/2 ALPN handshake for connections on routes configured with AI-proxy.

Bugfix

Fixed a bug where tools (function) calls to Anthropic would return empty results.
Fixed a bug where tools (function) calls to Bedrock would return empty results.
Fixed a bug where Bedrock Guardrail config was ignored.
Fixed a bug where tools (function) calls to Cohere would return empty results.
Fixed a bug where Gemini provider would return an error if content safety failed in AI Proxy.
Fixed a bug where tools (function) calls to Gemini (or via Vertex) would return empty results.
Fixed an issue where AI Transformer plugins always returned a 404 error when using ‘Google One’ Gemini subscriptions.
Fixed issue where multi-modal requests is blocked on azure provider.

3.8.1.0

Release date 2024/11/04

Bugfix

Fixed an issue where AI Transformer plugins always returned a 404 error when using ‘Google One’ Gemini subscriptions.
Fixed issue where multi-modal requests is blocked on azure provider.

3.8.0.0

Release date 2024/09/11

Feature

AI plugins: retrieved latency data and pushed it to logs and metrics.
allow AI plugin to read request from buffered file
Add allow_override option to allow overriding the upstream model auth parameter or header from the caller’s request.
Kong AI Gateway (AI Proxy and associated plugin family) now supports all AWS Bedrock “Converse API” models.
Kong AI Gateway (AI Proxy and associated plugin family) now supports the Google Gemini “chat” (generateContent) interface.
Allowed mistral provider to use mistral.ai managed service by omitting upstream_url
Added a new response header X-Kong-LLM-Model that displays the name of the language model used in the AI-Proxy plugin.

Bugfix

Fixed a bug where certain Azure models would return partial tokens/words when in response-streaming mode.
Fixed a bug where Cohere and Anthropic providers don’t read the model parameter properly from the caller’s request body.
Fixed a bug where using “OpenAI Function” inference requests would log a request error, and then hang until timeout.
Fixed a bug where AI Proxy would still allow callers to specify their own model,
ignoring the plugin-configured model name.
Fixed a bug where AI Proxy would not take precedence of the plugin’s configured model tuning options, over those in the user’s LLM request.
Fixed a bug where setting OpenAI SDK model parameter “null” caused analytics to not be written to the logging plugin(s).
Fixed issue when response is gzipped even if client doesn’t accept.
Fixed certain AI plugins cannot be applied per consumer or per service.
Resolved a bug where the object constructor would set data on the class instead of the instance
Fixed an issue for multi-modal inputs are not properly validated and calculated.
Fixed a bug where Azure Managed-Identity tokens would never rotate
in case of a network failure when authenticating.

3.7.1.3

Release date 2024/11/26

Bugfix

Fixed a bug where certain Azure models would return partial tokens/words when in response-streaming mode.
Fixed a bug where Cohere and Anthropic providers don’t read the model parameter properly from the caller’s request body.
Fixed a bug where using “OpenAI Function” inference requests would log a request error, and then hang until timeout.
Fixed a bug where AI Proxy would still allow callers to specify their own model,
ignoring the plugin-configured model name.
Fixed a bug where AI Proxy would not take precedence of the plugin’s configured model tuning options, over those in the user’s LLM request.
Fixed a bug where setting OpenAI SDK model parameter “null” caused analytics to not be written to the logging plugin(s).

3.7.1.0

Release date 2024/06/18

Bugfix

Resolved a bug where the object constructor would set data on the class instead of the instance

3.7.0.0

Release date 2024/05/28

Breaking Change

To support the new messages API of Anthropic, the upstream path of the Anthropic for llm/v1/chat route type has changed from /v1/complete to /v1/messages.

Feature

AI Proxy now reads most prompt tuning parameters from the client, while the plugin config parameters under model_options are now just defaults. This fixes support for using the respective provider’s native SDK.
AI Proxy now has a preserve option for route_type, where the requests and responses are passed directly to the upstream LLM. This is to enable compatibility with any and all models and SDKs that may be used when calling the AI services.
Added support for streaming event-by-event responses back to the client on supported providers.
Added support for Managed Identity authentication when using the Azure provider with AI Proxy.

Bugfix

Fixed the bug that the route_type /llm/v1/chat didn’t include the analytics in the responses.

3.6.0.0

Release date 2024/02/12

Feature

Introduced the new AI Proxy plugin that enables simplified integration with various AI provider Large Language Models.

AI Proxy

Help us make these docs great!

Still need help?