Related Documentation
Made by
Kong Inc.
Supported Gateway Topologies
hybrid db-less traditional
Supported Konnect Deployments
hybrid cloud-gateways serverless
Compatible Protocols
grpc grpcs http https
Minimum Version
Kong Gateway - 3.6
Tags
#ai

The AI Proxy plugin lets you transform and proxy requests to a number of AI providers and models.

The plugin accepts requests in one of a few defined and standardized formats, translates them to the configured target format, and then transforms the response back into a standard format.

The following table describes which providers and requests the AI Proxy plugin supports:

Provider Chat Completion Chat streaming Completions streaming Minimum Kong Gateway version
OpenAI (GPT-3.5, GPT-4, GPT-4o, and Multi-Modal) 3.6
Cohere 3.6
Azure 3.6
Anthropic 3.6
Mistral (mistral.ai, OpenAI, raw, and OLLAMA formats) 3.6
Llama2 (supports Llama2 and Llama3 models and raw, OLLAMA, and OpenAI formats) 3.6
Amazon Bedrock 3.8
Gemini 3.8
Hugging Face 3.9

How it works

The AI Proxy plugin will mediate the following for you:

  • Request and response formats appropriate for the configured config.model.provider and config.route_type
  • The following service request coordinates (unless the model is self-hosted):
    • Protocol
    • Host name
    • Port
    • Path
    • HTTP method
  • Authentication on behalf of the Kong API consumer
  • Decorating the request with parameters from the config.model.options block, appropriate for the chosen provider
  • Recording of usage statistics of the configured LLM provider and model into your selected Kong log plugin output
  • Optionally, additionally recording all post-transformation request and response messages from users, to and from the configured LLM
  • Fulfillment of requests to self-hosted models, based on select supported format transformations

Flattening all of the provider formats allows you to standardize the manipulation of the data before and after transmission. It also allows your to provide a choice of LLMs to the Kong consumers, using consistent request and response formats, regardless of the backend provider or model.

This plugin currently only supports REST-based full text responses.

Request and response formats

The plugin’s config.route_type should be set based on the target upstream endpoint and model, based on this capability matrix:

Provider name Provider path Kong route type Example model name
OpenAI /v1/chat/completions llm/v1/chat gpt-4
OpenAI /v1/completions llm/v1/completions gpt-3.5-turbo-instruct
Cohere /v1/chat llm/v1/chat command
Cohere /v1/generate llm/v1/completions command
Azure /openai/deployments/{deployment_name}/chat/completions llm/v1/chat gpt-4
Azure /openai/deployments/{deployment_name}/completions llm/v1/completions gpt-3.5-turbo-instruct
Anthropic /v1/complete in version 3.6, /v1/messages since version 3.7 llm/v1/chat claude-2.1
Anthropic /v1/complete llm/v1/completions claude-2.1
Mistral User-defined llm/v1/chat User-defined
Mistral User-defined llm/v1/completions User-defined
Llama2 User-defined llm/v1/chat User-defined
Llama2 User-defined llm/v1/completions User-defined
Amazon Bedrock Use the LLM chat upstream path llm/v1/chat Use the model name for the specific LLM provider
Gemini llm/v1/chat llm/v1/chat gemini-1.5-flash or gemini-1.5-pro
Hugging Face /models/{model_provider}/{model_name} llm/v1/chat Use the model name for the specific LLM provider
Hugging Face /models/{model_provider}/{model_name} llm/v1/completions Use the model name for the specific LLM provider

The following upstream URL patterns are used:

Provider URL
OpenAI https://api.openai.com:443/{route_type_path}
Cohere https://api.cohere.com:443/{route_type_path}
Azure https://{azure_instance}.openai.azure.com:443/openai/deployments/{deployment_name}/{route_type_path}
Anthropic https://api.anthropic.com:443/{route_type_path}
Mistral As defined in config.model.options.upstream_url
Llama2 As defined in config.model.options.upstream_url
Amazon Bedrock https://bedrock-runtime.{region}.amazonaws.com
Gemini https://generativelanguage.googleapis.com
Hugging Face https://api-inference.huggingface.co

While only the Llama2 and Mistral models are classed as self-hosted, the target URL can be overridden for any of the supported providers. For example, a self-hosted or otherwise OpenAI-compatible endpoint can be called by setting the same config.model.options.upstream_url plugin option.

v3.10+ If you are using each provider’s native SDK, Kong Gateway allows you to transparently proxy the request without any transformation and return the response unmodified. This can be done by setting config.llm_format to a value other than openai, such as gemini or bedrock.

In this mode, Kong Gateway will still provide useful analytics, logging, and cost calculation.

Input formats

Kong Gateway mediates the request and response format based on the selected config.model.provider and config.route_type.

v3.10+ By default, Kong Gateway uses the OpenAI format, but you can customize this using config.llm_format. If llm_format is not set to openai, the plugin will not transform the request when sending it upstream and will leave it as-is.

The Kong AI Proxy accepts the following inputs formats, standardized across all providers. The config.route_type must be configured respective to the required request and response format examples:

Response formats

Conversely, the response formats are also transformed to a standard format across all providers:

The request and response formats are loosely based on OpenAI. See the sample OpenAPI specification for more detail on the supported formats.

Templating v3.7+

The plugin allows you to substitute values in the config.model.name and any parameter under config.model.options with specific placeholders, similar to those in the Request Transformer Advanced plugin.

The following templated parameters are available:

  • $(headers.header_name): The value of a specific request header.
  • $(uri_captures.path_parameter_name): The value of a captured URI path parameter.
  • $(query_params.query_parameter_name): The value of a query string parameter.

You can combine these parameters with an OpenAI-compatible SDK in multiple ways using the AI Proxy plugin, depending on your specific use case:

Action

Description

Use chat route with dynamic model selection Configure a chat route that reads the target model from the request path instead of hardcoding it in the configuration.
Use the Azure deployment relevant to a specific model name Configure a header capture to insert the requested model name directly into the plugin configuration for Kong AI Gateway deployment with Azure OpenAI, as a string substitution.
Proxy multiple models deployed in the same Azure instance Configure one route to proxy multiple models deployed in the same Azure instance.
Use unsupported models with OpenAI-compatible SDKs Proxy models that are not officially supported, like Whisper-2, through an OpenAI-compatible interface using preserve routing.

This can be used to OpenAI-compatible SDK with this plugin in multiple ways, depending on the required use case.

Something wrong?

Help us make these docs great!

Kong Developer docs are open source. If you find these useful and want to make them better, contribute today!