Load balancing: Semanticv3.8+
Configure semantic load balancing with the AI Proxy Advanced plugin. To set up semantic routing, you must configure the following parameters:
-
config.embeddings
to define the model to use to match the model description and the prompts. -
config.vectordb
to define the vector database parameters. Only Redis is supported, so you need a Redis instance running in your environment. -
config.targets[].description
to define the description to be matched with the prompts.
This configuration routes incoming requests to the most relevant OpenAI model based on the content of the request:
- If the request is related to code completions, it will be routed to the
gpt-35-turbo
model. - If the request is about IT support, it will be routed to the
gpt-4o
model. - All other requests, which don’t match the above categories, will be handled by the
gpt-4o-mini
model, serving as a catch-all for general queries.
Prerequisites
-
An OpenAI account
-
A Redis instance running
Environment variables
-
OPENAI_API_KEY
: The API key to use to connect to OpenAI.
Add this section to your declarative configuration file:
_format_version: "3.0"
plugins:
- name: ai-proxy-advanced
config:
embeddings:
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
model:
name: text-embedding-3-small
provider: openai
vectordb:
dimensions: 1024
distance_metric: cosine
strategy: redis
threshold: 0.7
redis:
host: redis-stack-server
port: 6379
balancer:
algorithm: semantic
targets:
- model:
name: gpt-35-turbo
provider: openai
options:
max_tokens: 826
temperature: 0
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
description: Specialist in code completions
- model:
name: gpt-4o
provider: openai
options:
max_tokens: 512
temperature: 0.3
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
description: Requests related to IT support
- model:
name: gpt-4o-mini
provider: openai
options:
max_tokens: 256
temperature: 1.0
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
description: CATCHALL
Make the following request:
curl -i -X POST http://localhost:8001/plugins/ \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--data '
{
"name": "ai-proxy-advanced",
"config": {
"embeddings": {
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"model": {
"name": "text-embedding-3-small",
"provider": "openai"
}
},
"vectordb": {
"dimensions": 1024,
"distance_metric": "cosine",
"strategy": "redis",
"threshold": 0.7,
"redis": {
"host": "redis-stack-server",
"port": 6379
}
},
"balancer": {
"algorithm": "semantic"
},
"targets": [
{
"model": {
"name": "gpt-35-turbo",
"provider": "openai",
"options": {
"max_tokens": 826,
"temperature": 0
}
},
"route_type": "llm/v1/chat",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"description": "Specialist in code completions"
},
{
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"max_tokens": 512,
"temperature": 0.3
}
},
"route_type": "llm/v1/chat",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"description": "Requests related to IT support"
},
{
"model": {
"name": "gpt-4o-mini",
"provider": "openai",
"options": {
"max_tokens": 256,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"description": "CATCHALL"
}
]
}
}
'
Make the following request:
curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/plugins/ \
--header "accept: application/json" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer $KONNECT_TOKEN" \
--data '
{
"name": "ai-proxy-advanced",
"config": {
"embeddings": {
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"model": {
"name": "text-embedding-3-small",
"provider": "openai"
}
},
"vectordb": {
"dimensions": 1024,
"distance_metric": "cosine",
"strategy": "redis",
"threshold": 0.7,
"redis": {
"host": "redis-stack-server",
"port": 6379
}
},
"balancer": {
"algorithm": "semantic"
},
"targets": [
{
"model": {
"name": "gpt-35-turbo",
"provider": "openai",
"options": {
"max_tokens": 826,
"temperature": 0
}
},
"route_type": "llm/v1/chat",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"description": "Specialist in code completions"
},
{
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"max_tokens": 512,
"temperature": 0.3
}
},
"route_type": "llm/v1/chat",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"description": "Requests related to IT support"
},
{
"model": {
"name": "gpt-4o-mini",
"provider": "openai",
"options": {
"max_tokens": 256,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"description": "CATCHALL"
}
]
}
}
'
Make sure to replace the following placeholders with your own values:
-
region
: Geographic region where your Kong Konnect is hosted and operates. -
controlPlaneId
: Theid
of the control plane. -
KONNECT_TOKEN
: Your Personal Access Token (PAT) associated with your Konnect account.
See the Konnect API reference to learn about region-specific URLs and personal access tokens.
echo "
apiVersion: configuration.konghq.com/v1
kind: KongClusterPlugin
metadata:
name: ai-proxy-advanced
namespace: kong
annotations:
kubernetes.io/ingress.class: kong
labels:
global: 'true'
config:
embeddings:
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
model:
name: text-embedding-3-small
provider: openai
vectordb:
dimensions: 1024
distance_metric: cosine
strategy: redis
threshold: 0.7
redis:
host: redis-stack-server
port: 6379
balancer:
algorithm: semantic
targets:
- model:
name: gpt-35-turbo
provider: openai
options:
max_tokens: 826
temperature: 0
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
description: Specialist in code completions
- model:
name: gpt-4o
provider: openai
options:
max_tokens: 512
temperature: 0.3
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
description: Requests related to IT support
- model:
name: gpt-4o-mini
provider: openai
options:
max_tokens: 256
temperature: 1.0
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
description: CATCHALL
plugin: ai-proxy-advanced
" | kubectl apply -f -
Prerequisite: Configure your Personal Access Token
terraform {
required_providers {
konnect = {
source = "kong/konnect"
}
}
}
provider "konnect" {
personal_access_token = "$KONNECT_TOKEN"
server_url = "https://us.api.konghq.com/"
}
Add the following to your Terraform configuration to create a Konnect Gateway Plugin:
resource "konnect_gateway_plugin_ai_proxy_advanced" "my_ai_proxy_advanced" {
enabled = true
config = {
embeddings = {
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
model = {
name = "text-embedding-3-small"
provider = "openai"
}
}
vectordb = {
dimensions = 1024
distance_metric = "cosine"
strategy = "redis"
threshold = 0.7
redis = {
host = "redis-stack-server"
port = 6379
}
}
balancer = {
algorithm = "semantic"
}
targets = [
{
model = {
name = "gpt-35-turbo"
provider = "openai"
options = {
max_tokens = 826
temperature = 0
}
}
route_type = "llm/v1/chat"
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
description = "Specialist in code completions"
},
{
model = {
name = "gpt-4o"
provider = "openai"
options = {
max_tokens = 512
temperature = 0.3
}
}
route_type = "llm/v1/chat"
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
description = "Requests related to IT support"
},
{
model = {
name = "gpt-4o-mini"
provider = "openai"
options = {
max_tokens = 256
temperature = 1.0
}
}
route_type = "llm/v1/chat"
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
description = "CATCHALL"
} ]
}
control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
}
This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value
.
variable "openai_api_key" {
type = string
}
Add this section to your declarative configuration file:
_format_version: "3.0"
plugins:
- name: ai-proxy-advanced
service: serviceName|Id
config:
embeddings:
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
model:
name: text-embedding-3-small
provider: openai
vectordb:
dimensions: 1024
distance_metric: cosine
strategy: redis
threshold: 0.7
redis:
host: redis-stack-server
port: 6379
balancer:
algorithm: semantic
targets:
- model:
name: gpt-35-turbo
provider: openai
options:
max_tokens: 826
temperature: 0
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
description: Specialist in code completions
- model:
name: gpt-4o
provider: openai
options:
max_tokens: 512
temperature: 0.3
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
description: Requests related to IT support
- model:
name: gpt-4o-mini
provider: openai
options:
max_tokens: 256
temperature: 1.0
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
description: CATCHALL
Make sure to replace the following placeholders with your own values:
-
serviceName|Id
: Theid
orname
of the service the plugin configuration will target.
Make the following request:
curl -i -X POST http://localhost:8001/services/{serviceName|Id}/plugins/ \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--data '
{
"name": "ai-proxy-advanced",
"config": {
"embeddings": {
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"model": {
"name": "text-embedding-3-small",
"provider": "openai"
}
},
"vectordb": {
"dimensions": 1024,
"distance_metric": "cosine",
"strategy": "redis",
"threshold": 0.7,
"redis": {
"host": "redis-stack-server",
"port": 6379
}
},
"balancer": {
"algorithm": "semantic"
},
"targets": [
{
"model": {
"name": "gpt-35-turbo",
"provider": "openai",
"options": {
"max_tokens": 826,
"temperature": 0
}
},
"route_type": "llm/v1/chat",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"description": "Specialist in code completions"
},
{
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"max_tokens": 512,
"temperature": 0.3
}
},
"route_type": "llm/v1/chat",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"description": "Requests related to IT support"
},
{
"model": {
"name": "gpt-4o-mini",
"provider": "openai",
"options": {
"max_tokens": 256,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"description": "CATCHALL"
}
]
}
}
'
Make sure to replace the following placeholders with your own values:
-
serviceName|Id
: Theid
orname
of the service the plugin configuration will target.
Make the following request:
curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/services/{serviceId}/plugins/ \
--header "accept: application/json" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer $KONNECT_TOKEN" \
--data '
{
"name": "ai-proxy-advanced",
"config": {
"embeddings": {
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"model": {
"name": "text-embedding-3-small",
"provider": "openai"
}
},
"vectordb": {
"dimensions": 1024,
"distance_metric": "cosine",
"strategy": "redis",
"threshold": 0.7,
"redis": {
"host": "redis-stack-server",
"port": 6379
}
},
"balancer": {
"algorithm": "semantic"
},
"targets": [
{
"model": {
"name": "gpt-35-turbo",
"provider": "openai",
"options": {
"max_tokens": 826,
"temperature": 0
}
},
"route_type": "llm/v1/chat",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"description": "Specialist in code completions"
},
{
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"max_tokens": 512,
"temperature": 0.3
}
},
"route_type": "llm/v1/chat",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"description": "Requests related to IT support"
},
{
"model": {
"name": "gpt-4o-mini",
"provider": "openai",
"options": {
"max_tokens": 256,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"description": "CATCHALL"
}
]
}
}
'
Make sure to replace the following placeholders with your own values:
-
region
: Geographic region where your Kong Konnect is hosted and operates. -
controlPlaneId
: Theid
of the control plane. -
KONNECT_TOKEN
: Your Personal Access Token (PAT) associated with your Konnect account. -
serviceId
: Theid
of the service the plugin configuration will target.
See the Konnect API reference to learn about region-specific URLs and personal access tokens.
echo "
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
name: ai-proxy-advanced
namespace: kong
annotations:
kubernetes.io/ingress.class: kong
config:
embeddings:
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
model:
name: text-embedding-3-small
provider: openai
vectordb:
dimensions: 1024
distance_metric: cosine
strategy: redis
threshold: 0.7
redis:
host: redis-stack-server
port: 6379
balancer:
algorithm: semantic
targets:
- model:
name: gpt-35-turbo
provider: openai
options:
max_tokens: 826
temperature: 0
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
description: Specialist in code completions
- model:
name: gpt-4o
provider: openai
options:
max_tokens: 512
temperature: 0.3
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
description: Requests related to IT support
- model:
name: gpt-4o-mini
provider: openai
options:
max_tokens: 256
temperature: 1.0
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
description: CATCHALL
plugin: ai-proxy-advanced
" | kubectl apply -f -
Next, apply the KongPlugin
resource by annotating the service
resource:
kubectl annotate -n kong service SERVICE_NAME konghq.com/plugins=ai-proxy-advanced
Prerequisite: Configure your Personal Access Token
terraform {
required_providers {
konnect = {
source = "kong/konnect"
}
}
}
provider "konnect" {
personal_access_token = "$KONNECT_TOKEN"
server_url = "https://us.api.konghq.com/"
}
Add the following to your Terraform configuration to create a Konnect Gateway Plugin:
resource "konnect_gateway_plugin_ai_proxy_advanced" "my_ai_proxy_advanced" {
enabled = true
config = {
embeddings = {
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
model = {
name = "text-embedding-3-small"
provider = "openai"
}
}
vectordb = {
dimensions = 1024
distance_metric = "cosine"
strategy = "redis"
threshold = 0.7
redis = {
host = "redis-stack-server"
port = 6379
}
}
balancer = {
algorithm = "semantic"
}
targets = [
{
model = {
name = "gpt-35-turbo"
provider = "openai"
options = {
max_tokens = 826
temperature = 0
}
}
route_type = "llm/v1/chat"
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
description = "Specialist in code completions"
},
{
model = {
name = "gpt-4o"
provider = "openai"
options = {
max_tokens = 512
temperature = 0.3
}
}
route_type = "llm/v1/chat"
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
description = "Requests related to IT support"
},
{
model = {
name = "gpt-4o-mini"
provider = "openai"
options = {
max_tokens = 256
temperature = 1.0
}
}
route_type = "llm/v1/chat"
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
description = "CATCHALL"
} ]
}
control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
service = {
id = konnect_gateway_service.my_service.id
}
}
This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value
.
variable "openai_api_key" {
type = string
}
Add this section to your declarative configuration file:
_format_version: "3.0"
plugins:
- name: ai-proxy-advanced
route: routeName|Id
config:
embeddings:
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
model:
name: text-embedding-3-small
provider: openai
vectordb:
dimensions: 1024
distance_metric: cosine
strategy: redis
threshold: 0.7
redis:
host: redis-stack-server
port: 6379
balancer:
algorithm: semantic
targets:
- model:
name: gpt-35-turbo
provider: openai
options:
max_tokens: 826
temperature: 0
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
description: Specialist in code completions
- model:
name: gpt-4o
provider: openai
options:
max_tokens: 512
temperature: 0.3
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
description: Requests related to IT support
- model:
name: gpt-4o-mini
provider: openai
options:
max_tokens: 256
temperature: 1.0
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
description: CATCHALL
Make sure to replace the following placeholders with your own values:
-
routeName|Id
: Theid
orname
of the route the plugin configuration will target.
Make the following request:
curl -i -X POST http://localhost:8001/routes/{routeName|Id}/plugins/ \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--data '
{
"name": "ai-proxy-advanced",
"config": {
"embeddings": {
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"model": {
"name": "text-embedding-3-small",
"provider": "openai"
}
},
"vectordb": {
"dimensions": 1024,
"distance_metric": "cosine",
"strategy": "redis",
"threshold": 0.7,
"redis": {
"host": "redis-stack-server",
"port": 6379
}
},
"balancer": {
"algorithm": "semantic"
},
"targets": [
{
"model": {
"name": "gpt-35-turbo",
"provider": "openai",
"options": {
"max_tokens": 826,
"temperature": 0
}
},
"route_type": "llm/v1/chat",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"description": "Specialist in code completions"
},
{
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"max_tokens": 512,
"temperature": 0.3
}
},
"route_type": "llm/v1/chat",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"description": "Requests related to IT support"
},
{
"model": {
"name": "gpt-4o-mini",
"provider": "openai",
"options": {
"max_tokens": 256,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"description": "CATCHALL"
}
]
}
}
'
Make sure to replace the following placeholders with your own values:
-
routeName|Id
: Theid
orname
of the route the plugin configuration will target.
Make the following request:
curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/routes/{routeId}/plugins/ \
--header "accept: application/json" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer $KONNECT_TOKEN" \
--data '
{
"name": "ai-proxy-advanced",
"config": {
"embeddings": {
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"model": {
"name": "text-embedding-3-small",
"provider": "openai"
}
},
"vectordb": {
"dimensions": 1024,
"distance_metric": "cosine",
"strategy": "redis",
"threshold": 0.7,
"redis": {
"host": "redis-stack-server",
"port": 6379
}
},
"balancer": {
"algorithm": "semantic"
},
"targets": [
{
"model": {
"name": "gpt-35-turbo",
"provider": "openai",
"options": {
"max_tokens": 826,
"temperature": 0
}
},
"route_type": "llm/v1/chat",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"description": "Specialist in code completions"
},
{
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"max_tokens": 512,
"temperature": 0.3
}
},
"route_type": "llm/v1/chat",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"description": "Requests related to IT support"
},
{
"model": {
"name": "gpt-4o-mini",
"provider": "openai",
"options": {
"max_tokens": 256,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"description": "CATCHALL"
}
]
}
}
'
Make sure to replace the following placeholders with your own values:
-
region
: Geographic region where your Kong Konnect is hosted and operates. -
controlPlaneId
: Theid
of the control plane. -
KONNECT_TOKEN
: Your Personal Access Token (PAT) associated with your Konnect account. -
routeId
: Theid
of the route the plugin configuration will target.
See the Konnect API reference to learn about region-specific URLs and personal access tokens.
echo "
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
name: ai-proxy-advanced
namespace: kong
annotations:
kubernetes.io/ingress.class: kong
config:
embeddings:
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
model:
name: text-embedding-3-small
provider: openai
vectordb:
dimensions: 1024
distance_metric: cosine
strategy: redis
threshold: 0.7
redis:
host: redis-stack-server
port: 6379
balancer:
algorithm: semantic
targets:
- model:
name: gpt-35-turbo
provider: openai
options:
max_tokens: 826
temperature: 0
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
description: Specialist in code completions
- model:
name: gpt-4o
provider: openai
options:
max_tokens: 512
temperature: 0.3
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
description: Requests related to IT support
- model:
name: gpt-4o-mini
provider: openai
options:
max_tokens: 256
temperature: 1.0
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
description: CATCHALL
plugin: ai-proxy-advanced
" | kubectl apply -f -
Next, apply the KongPlugin
resource by annotating the httproute
or ingress
resource:
kubectl annotate -n kong httproute konghq.com/plugins=ai-proxy-advanced
kubectl annotate -n kong ingress konghq.com/plugins=ai-proxy-advanced
Prerequisite: Configure your Personal Access Token
terraform {
required_providers {
konnect = {
source = "kong/konnect"
}
}
}
provider "konnect" {
personal_access_token = "$KONNECT_TOKEN"
server_url = "https://us.api.konghq.com/"
}
Add the following to your Terraform configuration to create a Konnect Gateway Plugin:
resource "konnect_gateway_plugin_ai_proxy_advanced" "my_ai_proxy_advanced" {
enabled = true
config = {
embeddings = {
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
model = {
name = "text-embedding-3-small"
provider = "openai"
}
}
vectordb = {
dimensions = 1024
distance_metric = "cosine"
strategy = "redis"
threshold = 0.7
redis = {
host = "redis-stack-server"
port = 6379
}
}
balancer = {
algorithm = "semantic"
}
targets = [
{
model = {
name = "gpt-35-turbo"
provider = "openai"
options = {
max_tokens = 826
temperature = 0
}
}
route_type = "llm/v1/chat"
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
description = "Specialist in code completions"
},
{
model = {
name = "gpt-4o"
provider = "openai"
options = {
max_tokens = 512
temperature = 0.3
}
}
route_type = "llm/v1/chat"
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
description = "Requests related to IT support"
},
{
model = {
name = "gpt-4o-mini"
provider = "openai"
options = {
max_tokens = 256
temperature = 1.0
}
}
route_type = "llm/v1/chat"
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
description = "CATCHALL"
} ]
}
control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
route = {
id = konnect_gateway_route.my_route.id
}
}
This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value
.
variable "openai_api_key" {
type = string
}
Add this section to your declarative configuration file:
_format_version: "3.0"
plugins:
- name: ai-proxy-advanced
consumer: consumerName|Id
config:
embeddings:
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
model:
name: text-embedding-3-small
provider: openai
vectordb:
dimensions: 1024
distance_metric: cosine
strategy: redis
threshold: 0.7
redis:
host: redis-stack-server
port: 6379
balancer:
algorithm: semantic
targets:
- model:
name: gpt-35-turbo
provider: openai
options:
max_tokens: 826
temperature: 0
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
description: Specialist in code completions
- model:
name: gpt-4o
provider: openai
options:
max_tokens: 512
temperature: 0.3
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
description: Requests related to IT support
- model:
name: gpt-4o-mini
provider: openai
options:
max_tokens: 256
temperature: 1.0
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
description: CATCHALL
Make sure to replace the following placeholders with your own values:
-
consumerName|Id
: Theid
orname
of the consumer the plugin configuration will target.
Make the following request:
curl -i -X POST http://localhost:8001/consumers/{consumerName|Id}/plugins/ \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--data '
{
"name": "ai-proxy-advanced",
"config": {
"embeddings": {
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"model": {
"name": "text-embedding-3-small",
"provider": "openai"
}
},
"vectordb": {
"dimensions": 1024,
"distance_metric": "cosine",
"strategy": "redis",
"threshold": 0.7,
"redis": {
"host": "redis-stack-server",
"port": 6379
}
},
"balancer": {
"algorithm": "semantic"
},
"targets": [
{
"model": {
"name": "gpt-35-turbo",
"provider": "openai",
"options": {
"max_tokens": 826,
"temperature": 0
}
},
"route_type": "llm/v1/chat",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"description": "Specialist in code completions"
},
{
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"max_tokens": 512,
"temperature": 0.3
}
},
"route_type": "llm/v1/chat",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"description": "Requests related to IT support"
},
{
"model": {
"name": "gpt-4o-mini",
"provider": "openai",
"options": {
"max_tokens": 256,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"description": "CATCHALL"
}
]
}
}
'
Make sure to replace the following placeholders with your own values:
-
consumerName|Id
: Theid
orname
of the consumer the plugin configuration will target.
Make the following request:
curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/consumers/{consumerId}/plugins/ \
--header "accept: application/json" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer $KONNECT_TOKEN" \
--data '
{
"name": "ai-proxy-advanced",
"config": {
"embeddings": {
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"model": {
"name": "text-embedding-3-small",
"provider": "openai"
}
},
"vectordb": {
"dimensions": 1024,
"distance_metric": "cosine",
"strategy": "redis",
"threshold": 0.7,
"redis": {
"host": "redis-stack-server",
"port": 6379
}
},
"balancer": {
"algorithm": "semantic"
},
"targets": [
{
"model": {
"name": "gpt-35-turbo",
"provider": "openai",
"options": {
"max_tokens": 826,
"temperature": 0
}
},
"route_type": "llm/v1/chat",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"description": "Specialist in code completions"
},
{
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"max_tokens": 512,
"temperature": 0.3
}
},
"route_type": "llm/v1/chat",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"description": "Requests related to IT support"
},
{
"model": {
"name": "gpt-4o-mini",
"provider": "openai",
"options": {
"max_tokens": 256,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"description": "CATCHALL"
}
]
}
}
'
Make sure to replace the following placeholders with your own values:
-
region
: Geographic region where your Kong Konnect is hosted and operates. -
controlPlaneId
: Theid
of the control plane. -
KONNECT_TOKEN
: Your Personal Access Token (PAT) associated with your Konnect account. -
consumerId
: Theid
of the consumer the plugin configuration will target.
See the Konnect API reference to learn about region-specific URLs and personal access tokens.
echo "
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
name: ai-proxy-advanced
namespace: kong
annotations:
kubernetes.io/ingress.class: kong
config:
embeddings:
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
model:
name: text-embedding-3-small
provider: openai
vectordb:
dimensions: 1024
distance_metric: cosine
strategy: redis
threshold: 0.7
redis:
host: redis-stack-server
port: 6379
balancer:
algorithm: semantic
targets:
- model:
name: gpt-35-turbo
provider: openai
options:
max_tokens: 826
temperature: 0
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
description: Specialist in code completions
- model:
name: gpt-4o
provider: openai
options:
max_tokens: 512
temperature: 0.3
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
description: Requests related to IT support
- model:
name: gpt-4o-mini
provider: openai
options:
max_tokens: 256
temperature: 1.0
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
description: CATCHALL
plugin: ai-proxy-advanced
" | kubectl apply -f -
Next, apply the KongPlugin
resource by annotating the KongConsumer
resource:
kubectl annotate -n kong CONSUMER_NAME konghq.com/plugins=ai-proxy-advanced
Prerequisite: Configure your Personal Access Token
terraform {
required_providers {
konnect = {
source = "kong/konnect"
}
}
}
provider "konnect" {
personal_access_token = "$KONNECT_TOKEN"
server_url = "https://us.api.konghq.com/"
}
Add the following to your Terraform configuration to create a Konnect Gateway Plugin:
resource "konnect_gateway_plugin_ai_proxy_advanced" "my_ai_proxy_advanced" {
enabled = true
config = {
embeddings = {
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
model = {
name = "text-embedding-3-small"
provider = "openai"
}
}
vectordb = {
dimensions = 1024
distance_metric = "cosine"
strategy = "redis"
threshold = 0.7
redis = {
host = "redis-stack-server"
port = 6379
}
}
balancer = {
algorithm = "semantic"
}
targets = [
{
model = {
name = "gpt-35-turbo"
provider = "openai"
options = {
max_tokens = 826
temperature = 0
}
}
route_type = "llm/v1/chat"
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
description = "Specialist in code completions"
},
{
model = {
name = "gpt-4o"
provider = "openai"
options = {
max_tokens = 512
temperature = 0.3
}
}
route_type = "llm/v1/chat"
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
description = "Requests related to IT support"
},
{
model = {
name = "gpt-4o-mini"
provider = "openai"
options = {
max_tokens = 256
temperature = 1.0
}
}
route_type = "llm/v1/chat"
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
description = "CATCHALL"
} ]
}
control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
consumer = {
id = konnect_gateway_consumer.my_consumer.id
}
}
This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value
.
variable "openai_api_key" {
type = string
}
Add this section to your declarative configuration file:
_format_version: "3.0"
plugins:
- name: ai-proxy-advanced
consumer_group: consumerGroupName|Id
config:
embeddings:
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
model:
name: text-embedding-3-small
provider: openai
vectordb:
dimensions: 1024
distance_metric: cosine
strategy: redis
threshold: 0.7
redis:
host: redis-stack-server
port: 6379
balancer:
algorithm: semantic
targets:
- model:
name: gpt-35-turbo
provider: openai
options:
max_tokens: 826
temperature: 0
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
description: Specialist in code completions
- model:
name: gpt-4o
provider: openai
options:
max_tokens: 512
temperature: 0.3
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
description: Requests related to IT support
- model:
name: gpt-4o-mini
provider: openai
options:
max_tokens: 256
temperature: 1.0
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
description: CATCHALL
Make sure to replace the following placeholders with your own values:
-
consumerGroupName|Id
: Theid
orname
of the consumer group the plugin configuration will target.
Make the following request:
curl -i -X POST http://localhost:8001/consumer_groups/{consumerGroupName|Id}/plugins/ \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--data '
{
"name": "ai-proxy-advanced",
"config": {
"embeddings": {
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"model": {
"name": "text-embedding-3-small",
"provider": "openai"
}
},
"vectordb": {
"dimensions": 1024,
"distance_metric": "cosine",
"strategy": "redis",
"threshold": 0.7,
"redis": {
"host": "redis-stack-server",
"port": 6379
}
},
"balancer": {
"algorithm": "semantic"
},
"targets": [
{
"model": {
"name": "gpt-35-turbo",
"provider": "openai",
"options": {
"max_tokens": 826,
"temperature": 0
}
},
"route_type": "llm/v1/chat",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"description": "Specialist in code completions"
},
{
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"max_tokens": 512,
"temperature": 0.3
}
},
"route_type": "llm/v1/chat",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"description": "Requests related to IT support"
},
{
"model": {
"name": "gpt-4o-mini",
"provider": "openai",
"options": {
"max_tokens": 256,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"description": "CATCHALL"
}
]
}
}
'
Make sure to replace the following placeholders with your own values:
-
consumerGroupName|Id
: Theid
orname
of the consumer group the plugin configuration will target.
Make the following request:
curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/consumer_groups/{consumerGroupId}/plugins/ \
--header "accept: application/json" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer $KONNECT_TOKEN" \
--data '
{
"name": "ai-proxy-advanced",
"config": {
"embeddings": {
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"model": {
"name": "text-embedding-3-small",
"provider": "openai"
}
},
"vectordb": {
"dimensions": 1024,
"distance_metric": "cosine",
"strategy": "redis",
"threshold": 0.7,
"redis": {
"host": "redis-stack-server",
"port": 6379
}
},
"balancer": {
"algorithm": "semantic"
},
"targets": [
{
"model": {
"name": "gpt-35-turbo",
"provider": "openai",
"options": {
"max_tokens": 826,
"temperature": 0
}
},
"route_type": "llm/v1/chat",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"description": "Specialist in code completions"
},
{
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"max_tokens": 512,
"temperature": 0.3
}
},
"route_type": "llm/v1/chat",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"description": "Requests related to IT support"
},
{
"model": {
"name": "gpt-4o-mini",
"provider": "openai",
"options": {
"max_tokens": 256,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"description": "CATCHALL"
}
]
}
}
'
Make sure to replace the following placeholders with your own values:
-
region
: Geographic region where your Kong Konnect is hosted and operates. -
controlPlaneId
: Theid
of the control plane. -
KONNECT_TOKEN
: Your Personal Access Token (PAT) associated with your Konnect account. -
consumerGroupId
: Theid
of the consumer group the plugin configuration will target.
See the Konnect API reference to learn about region-specific URLs and personal access tokens.
echo "
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
name: ai-proxy-advanced
namespace: kong
annotations:
kubernetes.io/ingress.class: kong
config:
embeddings:
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
model:
name: text-embedding-3-small
provider: openai
vectordb:
dimensions: 1024
distance_metric: cosine
strategy: redis
threshold: 0.7
redis:
host: redis-stack-server
port: 6379
balancer:
algorithm: semantic
targets:
- model:
name: gpt-35-turbo
provider: openai
options:
max_tokens: 826
temperature: 0
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
description: Specialist in code completions
- model:
name: gpt-4o
provider: openai
options:
max_tokens: 512
temperature: 0.3
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
description: Requests related to IT support
- model:
name: gpt-4o-mini
provider: openai
options:
max_tokens: 256
temperature: 1.0
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
description: CATCHALL
plugin: ai-proxy-advanced
" | kubectl apply -f -
Next, apply the KongPlugin
resource by annotating the KongConsumerGroup
resource:
kubectl annotate -n kong CONSUMERGROUP_NAME konghq.com/plugins=ai-proxy-advanced
Prerequisite: Configure your Personal Access Token
terraform {
required_providers {
konnect = {
source = "kong/konnect"
}
}
}
provider "konnect" {
personal_access_token = "$KONNECT_TOKEN"
server_url = "https://us.api.konghq.com/"
}
Add the following to your Terraform configuration to create a Konnect Gateway Plugin:
resource "konnect_gateway_plugin_ai_proxy_advanced" "my_ai_proxy_advanced" {
enabled = true
config = {
embeddings = {
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
model = {
name = "text-embedding-3-small"
provider = "openai"
}
}
vectordb = {
dimensions = 1024
distance_metric = "cosine"
strategy = "redis"
threshold = 0.7
redis = {
host = "redis-stack-server"
port = 6379
}
}
balancer = {
algorithm = "semantic"
}
targets = [
{
model = {
name = "gpt-35-turbo"
provider = "openai"
options = {
max_tokens = 826
temperature = 0
}
}
route_type = "llm/v1/chat"
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
description = "Specialist in code completions"
},
{
model = {
name = "gpt-4o"
provider = "openai"
options = {
max_tokens = 512
temperature = 0.3
}
}
route_type = "llm/v1/chat"
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
description = "Requests related to IT support"
},
{
model = {
name = "gpt-4o-mini"
provider = "openai"
options = {
max_tokens = 256
temperature = 1.0
}
}
route_type = "llm/v1/chat"
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
description = "CATCHALL"
} ]
}
control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
consumer_group = {
id = konnect_gateway_consumer_group.my_consumer_group.id
}
}
This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value
.
variable "openai_api_key" {
type = string
}