AI Gateway providers

The core of AI Gateway is the ability to route AI requests to various providers exposed via a provider-agnostic API. This normalized API layer affords developers and organizations multiple benefits:

  • Client applications are shielded from AI provider API specifics, promoting code reusability
  • Centralized AI provider credential management
  • The AI Gateway gives developers and organizations a central point of governance and observability over AI data and usage
  • Request routing can be dynamic, allowing AI usage to be optimized based on various metrics
  • AI services can be used by other Kong Gateway plugins to augment non-AI API traffic

Note that some providers may not be available depending on your Kong Gateway version, and some providers don’t support all route types. See the specific provider documentation for more details.

Frequently Asked Questions

Yes, if Kong Gateway is running on Azure, AI Proxy can detect the designated Managed Identity or User-Assigned Identity of that Azure Compute resource, and use it accordingly. In your AI Proxy configuration, set the following parameters:

You have several options, depending on the SDK and configuration:

  • Use the Gemini SDK:

    1. Set llm_format to gemini.
    2. Use the Gemini provider.
    3. Configure parameters like temperature, top_p, and top_k on the client side:
       model = genai.GenerativeModel(
           'gemini-1.5-flash',
           generation_config=genai.types.GenerationConfig(
               temperature=0.7,
               top_p=0.9,
               top_k=40,
               max_output_tokens=1024
           )
       )
      
  • Use the OpenAI SDK with the Gemini provider:

    1. Set llm_format to openai.
    2. You can configure parameters in one of three ways:
      • Configure them in the plugin only.
      • Configure them in the client only.
      • Configure them in both—the client-side values will override plugin config.

Configure AI Proxy Advanced with the Gemini provider and declare the googleSearch tool in your requests. See Use Gemini’s googleSearch tool with AI Proxy Advanced.

Pass imageConfig parameters via generationConfig in your image generation requests. See Use Gemini’s imageConfig with AI Proxy.

Pass thinkingConfig parameters via extra_body in your requests to enable detailed reasoning traces. See Use Gemini’s thinkingConfig with AI Proxy Advanced.

For cross-region inference, prefix the model ID with a geographic identifier:

{geography-prefix}.{provider}.{model-name}...

For example: us.anthropic.claude-sonnet-4-5-20250929-v1:0

Prefix

Geography

us. United States
eu. European Union
apac. Asia-Pacific
global. All commercial regions

For a full list of supported cross-region inference profiles, see Supported Regions and models for inference profiles in the AWS documentation.

Use the extra_body feature when sending requests in OpenAI format:

    curl http://localhost:8000 \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "amazon.nova-reel-v1:0",
        "prompt": "A large red square that is rotating",
        "extra_body": {
        "fps": 24
        }
    }'

Configure AI Proxy with the bedrock provider and set up AWS authentication using IAM credentials or assumed roles. See Use AWS Bedrock rerank API with AI Proxy.

Add a guardrailConfig object to your request body:

      {
          "messages": [
              {
                  "role": "system",
                  "content": "You are a scientist."
              },
              {
                  "role": "user",
                  "content": "What is the Boltzmann equation?"
              }
          ],
          "guardrailConfig": {
              "guardrailIdentifier": "$GUARDRAIL-IDENTIFIER",
              "guardrailVersion": "1",
              "trace": "enabled"
          }
      }

This feature requires Kong Gateway 3.9 or later. For more details, see Guardrails and content safety and the AWS Bedrock guardrails documentation.

Configure AI Proxy with the Cohere provider and send queries with candidate documents. The model filters for relevance and returns answers with citations. See Use Cohere rerank API for document-grounded chat.

Something wrong?

Help us make these docs great!

Kong Developer docs are open source. If you find these useful and want to make them better, contribute today!