Hugging Face provider

Uses: Kong Gateway AI Gateway Admin API deck KIC Konnect API Terraform

Upstream paths

AI Gateway automatically routes requests to the appropriate Hugging Face API endpoints. The following table shows the upstream paths used for each capability.

Capability	Upstream path or API
Chat completions	`/v1/chat/completions`
Embeddings	`/hf-inference/models/{model_name}/pipeline/feature-extraction`
Video generations	`/v1/videos`

Supported capabilities

The following tables show the AI capabilities supported by Hugging Face provider when used with the AI Proxy or the AI Proxy Advanced plugin.

Set the plugin’s route_type based on the capability you want to use. See the tables below for supported route types.

Text generation

Support for Hugging Face basic text generation capabilities including chat, completions, and embeddings:

Capability	Route type	Streaming	Model example	Min version
Chat completions	`llm/v1/chat`	Supported	Use the model name for the specific LLM provider	3.9
Embeddings	`llm/v1/embeddings`	Not supported	Use the embedding model name	3.11

Video

Support for Hugging Face video generation capabilities:

Capability	Route type	Model example	Min version
Generations	`video/v1/videos/generations`	Use the video generation model name	3.13

For requests with large payloads (video generation), consider increasing config.max_request_body_size to three times the raw binary size.

Hugging Face base URL

The base URL is https://api-inference.huggingface.co, where {route_type_path} is determined by the capability.

AI Gateway uses this URL automatically. You only need to configure a URL if you’re using a self-hosted or Hugging Face-compatible endpoint, in which case set the upstream_url plugin option.

Supported native LLM formats for Hugging Face

By default, the AI Proxy plugin uses OpenAI-compatible request formats. Set config.llm_format to a native format to use Hugging Face-specific APIs and features.

The following native Hugging Face APIs are supported:

LLM format	Supported APIs
`huggingface`	`/generate` `/generate_stream`

Configure Hugging Face with AI Proxy

To use Hugging Face with AI Gateway, configure the AI Proxy or AI Proxy Advanced.

Here’s a minimal configuration for chat completions:

kong.yaml

Copied!

_format_version: "3.0"
plugins:
  - name: ai-proxy
    config:
      route_type: llm/v1/chat
      auth:
        header_name: Authorization
        header_value: Bearer ${{ env "DECK_HUGGINGFACE_TOKEN" }}
      model:
        provider: huggingface
        name: Qwen/Qwen3-4B-Instruct-2507

curl -i -X POST http://localhost:8001/plugins/ \
    --header "Accept: application/json" \
    --header "Content-Type: application/json" \
    --data '
    {
      "name": "ai-proxy",
      "config": {
        "route_type": "llm/v1/chat",
        "auth": {
          "header_name": "Authorization",
          "header_value": "Bearer '$HUGGINGFACE_TOKEN'"
        },
        "model": {
          "provider": "huggingface",
          "name": "Qwen/Qwen3-4B-Instruct-2507"
        }
      }
    }
    '

Copied!

Make the following request:

curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/plugins/ \
    --header "accept: application/json" \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer $KONNECT_TOKEN" \
    --data '
    {
      "name": "ai-proxy",
      "config": {
        "route_type": "llm/v1/chat",
        "auth": {
          "header_name": "Authorization",
          "header_value": "Bearer '$HUGGINGFACE_TOKEN'"
        },
        "model": {
          "provider": "huggingface",
          "name": "Qwen/Qwen3-4B-Instruct-2507"
        }
      }
    }
    '

Copied!

echo "
apiVersion: configuration.konghq.com/v1
kind: KongClusterPlugin
metadata:
  name:
  namespace: kong
  annotations:
    kubernetes.io/ingress.class: kong
  labels:
    global: 'true'
config:
  route_type: llm/v1/chat
  auth:
    header_name: Authorization
    header_value: Bearer $HUGGINGFACE_TOKEN
  model:
    provider: huggingface
    name: Qwen/Qwen3-4B-Instruct-2507
plugin: ai-proxy
" | kubectl apply -f -

Copied!

resource "konnect_gateway_plugin_ai_proxy" "my_ai_proxy" {
  enabled = true

  config = {
    route_type = "llm/v1/chat"

    auth = {
      header_name = "Authorization"
      header_value = "Bearer var.huggingface_token"
    }

    model = {
      provider = "huggingface"
      name = "Qwen/Qwen3-4B-Instruct-2507"
    }
  }

  control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
}

Copied!

This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value.

variable "huggingface_token" {
  type = string
}

Copied!

For more configuration options and examples, see:

AI Proxy examples

AI Proxy Advanced examples

Hugging Face provider

Upstream paths

Supported capabilities

Text generation

Video

Hugging Face base URL

Supported native LLM formats for Hugging Face

Configure Hugging Face with AI Proxy

Tutorials

Help us make these docs great!

Still need help?