The AI Proxy plugin lets you transform and proxy requests to a number of AI providers and models.
The plugin accepts requests in one of a few defined and standardized formats, translates them to the configured target format, and then transforms the response back into a standard format.
The following table describes which providers and requests the AI Proxy plugin supports:
Provider
Chat
Completion
Chat streaming
Completions streaming
Minimum Kong Gateway version
OpenAI
(GPT-3.5, GPT-4, GPT-4o, and Multi-Modal)
3.6
Cohere
3.6
Azure
3.6
Anthropic
3.6
Mistral
(mistral.ai, OpenAI, raw, and OLLAMA formats)
3.6
Llama2
(supports Llama2 and Llama3 models and raw, OLLAMA, and OpenAI formats)
The AI Proxy plugin will mediate the following for you:
Request and response formats appropriate for the configured config.model.provider and config.route_type
The following service request coordinates (unless the model is self-hosted):
Protocol
Host name
Port
Path
HTTP method
Authentication on behalf of the Kong API consumer
Decorating the request with parameters from the config.model.options block, appropriate for the chosen provider
Recording of usage statistics of the configured LLM provider and model into your selected Kong log plugin output
Optionally, additionally recording all post-transformation request and response messages from users, to and from the configured LLM
Fulfillment of requests to self-hosted models, based on select supported format transformations
Flattening all of the provider formats allows you to standardize the manipulation of the data before and after transmission. It also allows your to provide a choice of LLMs to the Kong consumers, using consistent request and response formats, regardless of the backend provider or model.
This plugin currently only supports REST-based full text responses.
While only the Llama2 and Mistral models are classed as self-hosted, the target URL can be overridden for any of the supported providers.
For example, a self-hosted or otherwise OpenAI-compatible endpoint can be called by setting the same config.model.options.upstream_url plugin option.
v3.10+ If you are using each provider’s native SDK, Kong Gateway allows you to transparently proxy the request without any transformation and return the response unmodified. This can be done by setting config.llm_format to a value other than openai, such as gemini or bedrock.
In this mode, Kong Gateway will still provide useful analytics, logging, and cost calculation.
v3.10+ By default, Kong Gateway uses the OpenAI format, but you can customize this using config.llm_format. If llm_format is not set to openai, the plugin will not transform the request when sending it upstream and will leave it as-is.
The Kong AI Proxy accepts the following inputs formats, standardized across all providers. The config.route_type must be configured respective to the required request and response format examples:
{"messages":[{"role":"system","content":"You are a scientist."},{"role":"user","content":"What is the theory of relativity?"}]}
v3.9+With Amazon Bedrock, you can include your guardrail configuration in the request:
{"messages":[{"role":"system","content":"You are a scientist."},{"role":"user","content":"What is the theory of relativity?"}],"extra_body":{"guardrailConfig":{"guardrailIdentifier":"<guardrail_identifier>","guardrailVersion":"1","trace":"enabled"}}}
{"prompt":"You are a scientist. What is the theory of relativity?"}
Conversely, the response formats are also transformed to a standard format across all providers:
{"choices":[{"finish_reason":"stop","index":0,"message":{"content":"The theory of relativity is a...","role":"assistant"}}],"created":1707769597,"id":"chatcmpl-ID","model":"gpt-4-0613","object":"chat.completion","usage":{"completion_tokens":5,"prompt_tokens":26,"total_tokens":31}}
{"choices":[{"finish_reason":"stop","index":0,"text":"The theory of relativity is a..."}],"created":1707769597,"id":"cmpl-ID","model":"gpt-3.5-turbo-instruct","object":"text_completion","usage":{"completion_tokens":10,"prompt_tokens":7,"total_tokens":17}}
The request and response formats are loosely based on OpenAI.
See the sample OpenAPI specification for more detail on the supported formats.
Configure a header capture to insert the requested model name directly into the plugin configuration for Kong AI Gateway deployment with Azure OpenAI, as a string substitution.