Related Documentation
Minimum Version
Kong Gateway - 3.8
Tags
#ai

You can proxy requests to Gemini AI models through AI Gateway using the AI Proxy and AI Proxy Advanced plugins. This reference documents all supported AI capabilities, configuration requirements, and provider-specific details needed for proper integration.

Upstream paths

AI Gateway automatically routes requests to the appropriate Gemini API endpoints. The following table shows the upstream paths used for each capability.

Capability Upstream path or API
Chat completions Uses generateContent API
Embeddings Uses batchEmbedContents API
Function calling Uses generateContent API with function declarations
Files Uses uploadFile and files API
Batches Uses batches API
Image generations Uses generateContent API
Image edits Uses generateContent API
Video generations Uses predictLongRunning API
Realtime Uses BidiGenerateContent API

Supported capabilities

The following tables show the AI capabilities supported by Gemini provider when used with the AI Proxy or the AI Proxy Advanced plugin.

Set the plugin’s route_type based on the capability you want to use. See the tables below for supported route types.

Text generation

Support for Gemini basic text generation capabilities including chat, completions, and embeddings:

Capability Route type Streaming Model example Min version
Chat completions llm/v1/chat gemini-2.5-flash 3.8
Embeddings llm/v1/embeddings text-embedding-004 3.11

Advanced text generation

Support for Gemini function calling to allow Gemini models to use external tools and APIs:

Capability Route type Model example Min version
Function calling llm/v1/chat gemini-2.5-flash 3.8

Processing

Support for Gemini file operations, batch operations, assistants, and response handling:

Capability Route type Model example Min version
Files1 llm/v1/files n/a n/a
Batches2 llm/v1/batches n/a n/a

1 Files processing for Gemini is supported in the native format from SDK only

2 Batches processing for Gemini is supported in the native format from SDK only

Image

Support for Gemini image generation and editing capabilities:

Capability Route type Model example Min version
Generations image/v1/images/generations gemini-2.5-flash-preview-image-generation 3.11
Edits image/v1/images/edits gemini-2.5-flash-preview-image-generation 3.11

For requests with large payloads, consider increasing config.max_request_body_size to three times the raw binary size.

Supported image sizes and formats vary by model. Refer to your provider’s documentation for allowed dimensions and requirements.

Video

Support for Gemini video generation capabilities:

Capability Route type Model example Min version
Generations video/v1/videos/generations veo-3.1-generate-001 3.13

For requests with large payloads (video generation), consider increasing config.max_request_body_size to three times the raw binary size.

Realtime

Support for Gemini’s bidirectional streaming for realtime applications:

Realtime processing requires the AI Proxy Advanced plugin and uses WebSocket protocol.

To use the realtime route, you must configure the protocols ws and/or wss on both the Service and on the Route where the plugin is associated.

Capability Route type Model example Min version
Realtime3 realtime/v1/realtime gemini-2.5-flash-preview-native-audio 3.13

3 Realtime processing for Gemini is supported in the native format from SDK only

Gemini base URL

The base URL is https://generativelanguage.googleapis.com, where {route_type_path} is determined by the capability.

AI Gateway uses this URL automatically. You only need to configure a URL if you’re using a self-hosted or Gemini-compatible endpoint, in which case set the upstream_url plugin option.

Supported native LLM formats for Gemini

By default, the AI Proxy plugin uses OpenAI-compatible request formats. Set config.llm_format to a native format to use Gemini-specific APIs and features.

The following native Gemini APIs are supported:

LLM format Supported APIs
gemini
  • /v1beta/models/{model_name}:generateContent
  • /v1beta/models/{model_name}:streamGenerateContent
  • /v1beta/models/{model_name}:embedContent
  • /v1beta/models/{model_name}:batchEmbedContent
  • /v1beta/batches
  • /upload/{file_id}/files
  • /v1beta/files

Provider-specific limitations for native formats

  • Gemini only supports auth.allow_override = false

Configure Gemini with AI Proxy

To use Gemini with AI Gateway, configure the AI Proxy or AI Proxy Advanced.

Here’s a minimal configuration for chat completions:

For more configuration options and examples, see:

FAQs

You have several options, depending on the SDK and configuration:

  • Use the Gemini SDK:

    1. Set llm_format to gemini.
    2. Use the Gemini provider.
    3. Configure parameters like temperature, top_p, and top_k on the client side:
       model = genai.GenerativeModel(
           'gemini-1.5-flash',
           generation_config=genai.types.GenerationConfig(
               temperature=0.7,
               top_p=0.9,
               top_k=40,
               max_output_tokens=1024
           )
       )
      
  • Use the OpenAI SDK with the Gemini provider:

    1. Set llm_format to openai.
    2. You can configure parameters in one of three ways:
      • Configure them in the plugin only.
      • Configure them in the client only.
      • Configure them in both—the client-side values will override plugin config.

Configure AI Proxy Advanced with the Gemini provider and declare the googleSearch tool in your requests. See Use Gemini’s googleSearch tool with AI Proxy Advanced.

Pass imageConfig parameters via generationConfig in your image generation requests. See Use Gemini’s imageConfig with AI Proxy.

Pass thinkingConfig parameters via extra_body in your requests to enable detailed reasoning traces. See Use Gemini’s thinkingConfig with AI Proxy Advanced.

Something wrong?

Help us make these docs great!

Kong Developer docs are open source. If you find these useful and want to make them better, contribute today!