AI Proxy Advanced: Load balancing: Lowest-usage - Plugin

Load balancing: Lowest-usage

Configure the plugin to use two OpenAI models and route requests based on the number of tokens in the prompt.

The lowest-usage algorithm distributes requests to the model with the lowest usage volume. By default, the usage is calculated based on the total number of tokens in the prompt and in the response. However, you can customize this using the config.balancer.tokens_count_strategy parameter. You can use:

prompt-tokens to only count the tokens in the prompt
completion-tokens to only count the tokens in the response
total-tokens to count both tokens in the prompt and in the response
v3.10+ cost to count the cost of the tokens.
You must set the cost parameter in each model configuration to use this strategy and log_statistics must be enabled.

Prerequisites

An OpenAI account

Environment variables

OPENAI_API_KEY: The API key to use to connect to OpenAI.

Set up the plugin

Add this section to your kong.yaml configuration file:

kong.yaml

Copied!

_format_version: "3.0"
plugins:
  - name: ai-proxy-advanced
    config:
      balancer:
        algorithm: lowest-usage
        tokens_count_strategy: prompt-tokens
      targets:
      - model:
          name: gpt-4
          provider: openai
          options:
            max_tokens: 512
            temperature: 1.0
        route_type: llm/v1/chat
        auth:
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
      - model:
          name: gpt-4o-mini
          provider: openai
          options:
            max_tokens: 512
            temperature: 1.0
        route_type: llm/v1/chat
        auth:
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}

Make the following request:

curl -i -X POST http://localhost:8001/plugins/ \
    --header "Accept: application/json" \
    --header "Content-Type: application/json" \
    --data '
    {
      "name": "ai-proxy-advanced",
      "config": {
        "balancer": {
          "algorithm": "lowest-usage",
          "tokens_count_strategy": "prompt-tokens"
        },
        "targets": [
          {
            "model": {
              "name": "gpt-4",
              "provider": "openai",
              "options": {
                "max_tokens": 512,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            }
          },
          {
            "model": {
              "name": "gpt-4o-mini",
              "provider": "openai",
              "options": {
                "max_tokens": 512,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            }
          }
        ]
      },
      "tags": []
    }
    '

Copied!

Make the following request:

curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/plugins/ \
    --header "accept: application/json" \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer $KONNECT_TOKEN" \
    --data '
    {
      "name": "ai-proxy-advanced",
      "config": {
        "balancer": {
          "algorithm": "lowest-usage",
          "tokens_count_strategy": "prompt-tokens"
        },
        "targets": [
          {
            "model": {
              "name": "gpt-4",
              "provider": "openai",
              "options": {
                "max_tokens": 512,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            }
          },
          {
            "model": {
              "name": "gpt-4o-mini",
              "provider": "openai",
              "options": {
                "max_tokens": 512,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            }
          }
        ]
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

region: Geographic region where your Kong Konnect is hosted and operates.
KONNECT_TOKEN: Your Personal Access Token (PAT) associated with your Konnect account.
controlPlaneId: The id of the control plane.

See the Konnect API reference to learn about region-specific URLs and personal access tokens.

echo "
apiVersion: configuration.konghq.com/v1
kind: KongClusterPlugin
metadata:
  name: ai-proxy-advanced
  namespace: kong
  annotations:
    kubernetes.io/ingress.class: kong
    konghq.com/tags: ''
  labels:
    global: 'true'
config:
  balancer:
    algorithm: lowest-usage
    tokens_count_strategy: prompt-tokens
  targets:
  - model:
      name: gpt-4
      provider: openai
      options:
        max_tokens: 512
        temperature: 1.0
    route_type: llm/v1/chat
    auth:
      header_name: Authorization
      header_value: Bearer $OPENAI_API_KEY
  - model:
      name: gpt-4o-mini
      provider: openai
      options:
        max_tokens: 512
        temperature: 1.0
    route_type: llm/v1/chat
    auth:
      header_name: Authorization
      header_value: Bearer $OPENAI_API_KEY
plugin: ai-proxy-advanced
" | kubectl apply -f -

Copied!

Prerequisite: Configure your Personal Access Token

terraform {
  required_providers {
    konnect = {
      source  = "kong/konnect"
    }
  }
}

provider "konnect" {
  personal_access_token = "$KONNECT_TOKEN"
  server_url            = "https://us.api.konghq.com/"
}

Copied!

Add the following to your Terraform configuration to create a Konnect Gateway Plugin:

resource "konnect_gateway_plugin_ai_proxy_advanced" "my_ai_proxy_advanced" {
  enabled = true

  config = {

    balancer = {
      algorithm = "lowest-usage"
      tokens_count_strategy = "prompt-tokens"
    }
    targets = [
      {

        model = {
          name = "gpt-4"
          provider = "openai"

          options = {
            max_tokens = 512
            temperature = 1.0
          }
        }
        route_type = "llm/v1/chat"

        auth = {
          header_name = "Authorization"
          header_value = "Bearer var.openai_api_key"
        }
      }, 

      {

        model = {
          name = "gpt-4o-mini"
          provider = "openai"

          options = {
            max_tokens = 512
            temperature = 1.0
          }
        }
        route_type = "llm/v1/chat"

        auth = {
          header_name = "Authorization"
          header_value = "Bearer var.openai_api_key"
        }
      }    ]
  }
  tags = []

  control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
}

Copied!

This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value.

variable "openai_api_key" {
  type = string
}

Copied!

Add this section to your kong.yaml configuration file:

kong.yaml

Copied!

_format_version: "3.0"
plugins:
  - name: ai-proxy-advanced
    service: serviceName|Id
    config:
      balancer:
        algorithm: lowest-usage
        tokens_count_strategy: prompt-tokens
      targets:
      - model:
          name: gpt-4
          provider: openai
          options:
            max_tokens: 512
            temperature: 1.0
        route_type: llm/v1/chat
        auth:
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
      - model:
          name: gpt-4o-mini
          provider: openai
          options:
            max_tokens: 512
            temperature: 1.0
        route_type: llm/v1/chat
        auth:
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}

Make sure to replace the following placeholders with your own values:

serviceName|Id: The id or name of the service the plugin configuration will target.

Make the following request:

curl -i -X POST http://localhost:8001/services/{serviceName|Id}/plugins/ \
    --header "Accept: application/json" \
    --header "Content-Type: application/json" \
    --data '
    {
      "name": "ai-proxy-advanced",
      "config": {
        "balancer": {
          "algorithm": "lowest-usage",
          "tokens_count_strategy": "prompt-tokens"
        },
        "targets": [
          {
            "model": {
              "name": "gpt-4",
              "provider": "openai",
              "options": {
                "max_tokens": 512,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            }
          },
          {
            "model": {
              "name": "gpt-4o-mini",
              "provider": "openai",
              "options": {
                "max_tokens": 512,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            }
          }
        ]
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

serviceName|Id: The id or name of the service the plugin configuration will target.

Make the following request:

curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/services/{serviceId}/plugins/ \
    --header "accept: application/json" \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer $KONNECT_TOKEN" \
    --data '
    {
      "name": "ai-proxy-advanced",
      "config": {
        "balancer": {
          "algorithm": "lowest-usage",
          "tokens_count_strategy": "prompt-tokens"
        },
        "targets": [
          {
            "model": {
              "name": "gpt-4",
              "provider": "openai",
              "options": {
                "max_tokens": 512,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            }
          },
          {
            "model": {
              "name": "gpt-4o-mini",
              "provider": "openai",
              "options": {
                "max_tokens": 512,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            }
          }
        ]
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

region: Geographic region where your Kong Konnect is hosted and operates.
KONNECT_TOKEN: Your Personal Access Token (PAT) associated with your Konnect account.
controlPlaneId: The id of the control plane.
serviceId: The id of the service the plugin configuration will target.

See the Konnect API reference to learn about region-specific URLs and personal access tokens.

echo "
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
  name: ai-proxy-advanced
  namespace: kong
  annotations:
    kubernetes.io/ingress.class: kong
    konghq.com/tags: ''
config:
  balancer:
    algorithm: lowest-usage
    tokens_count_strategy: prompt-tokens
  targets:
  - model:
      name: gpt-4
      provider: openai
      options:
        max_tokens: 512
        temperature: 1.0
    route_type: llm/v1/chat
    auth:
      header_name: Authorization
      header_value: Bearer $OPENAI_API_KEY
  - model:
      name: gpt-4o-mini
      provider: openai
      options:
        max_tokens: 512
        temperature: 1.0
    route_type: llm/v1/chat
    auth:
      header_name: Authorization
      header_value: Bearer $OPENAI_API_KEY
plugin: ai-proxy-advanced
" | kubectl apply -f -

Copied!

Next, apply the KongPlugin resource by annotating the service resource:

kubectl annotate -n kong service SERVICE_NAME konghq.com/plugins=ai-proxy-advanced

Copied!

Prerequisite: Configure your Personal Access Token

terraform {
  required_providers {
    konnect = {
      source  = "kong/konnect"
    }
  }
}

provider "konnect" {
  personal_access_token = "$KONNECT_TOKEN"
  server_url            = "https://us.api.konghq.com/"
}

Copied!

Add the following to your Terraform configuration to create a Konnect Gateway Plugin:

resource "konnect_gateway_plugin_ai_proxy_advanced" "my_ai_proxy_advanced" {
  enabled = true

  config = {

    balancer = {
      algorithm = "lowest-usage"
      tokens_count_strategy = "prompt-tokens"
    }
    targets = [
      {

        model = {
          name = "gpt-4"
          provider = "openai"

          options = {
            max_tokens = 512
            temperature = 1.0
          }
        }
        route_type = "llm/v1/chat"

        auth = {
          header_name = "Authorization"
          header_value = "Bearer var.openai_api_key"
        }
      }, 

      {

        model = {
          name = "gpt-4o-mini"
          provider = "openai"

          options = {
            max_tokens = 512
            temperature = 1.0
          }
        }
        route_type = "llm/v1/chat"

        auth = {
          header_name = "Authorization"
          header_value = "Bearer var.openai_api_key"
        }
      }    ]
  }
  tags = []

  control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
  service = {
    id = konnect_gateway_service.my_service.id
  }
}

Copied!

This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value.

variable "openai_api_key" {
  type = string
}

Copied!

Add this section to your kong.yaml configuration file:

kong.yaml

Copied!

_format_version: "3.0"
plugins:
  - name: ai-proxy-advanced
    route: routeName|Id
    config:
      balancer:
        algorithm: lowest-usage
        tokens_count_strategy: prompt-tokens
      targets:
      - model:
          name: gpt-4
          provider: openai
          options:
            max_tokens: 512
            temperature: 1.0
        route_type: llm/v1/chat
        auth:
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
      - model:
          name: gpt-4o-mini
          provider: openai
          options:
            max_tokens: 512
            temperature: 1.0
        route_type: llm/v1/chat
        auth:
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}

Make sure to replace the following placeholders with your own values:

routeName|Id: The id or name of the route the plugin configuration will target.

Make the following request:

curl -i -X POST http://localhost:8001/routes/{routeName|Id}/plugins/ \
    --header "Accept: application/json" \
    --header "Content-Type: application/json" \
    --data '
    {
      "name": "ai-proxy-advanced",
      "config": {
        "balancer": {
          "algorithm": "lowest-usage",
          "tokens_count_strategy": "prompt-tokens"
        },
        "targets": [
          {
            "model": {
              "name": "gpt-4",
              "provider": "openai",
              "options": {
                "max_tokens": 512,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            }
          },
          {
            "model": {
              "name": "gpt-4o-mini",
              "provider": "openai",
              "options": {
                "max_tokens": 512,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            }
          }
        ]
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

routeName|Id: The id or name of the route the plugin configuration will target.

Make the following request:

curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/routes/{routeId}/plugins/ \
    --header "accept: application/json" \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer $KONNECT_TOKEN" \
    --data '
    {
      "name": "ai-proxy-advanced",
      "config": {
        "balancer": {
          "algorithm": "lowest-usage",
          "tokens_count_strategy": "prompt-tokens"
        },
        "targets": [
          {
            "model": {
              "name": "gpt-4",
              "provider": "openai",
              "options": {
                "max_tokens": 512,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            }
          },
          {
            "model": {
              "name": "gpt-4o-mini",
              "provider": "openai",
              "options": {
                "max_tokens": 512,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            }
          }
        ]
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

region: Geographic region where your Kong Konnect is hosted and operates.
KONNECT_TOKEN: Your Personal Access Token (PAT) associated with your Konnect account.
controlPlaneId: The id of the control plane.
routeId: The id of the route the plugin configuration will target.

See the Konnect API reference to learn about region-specific URLs and personal access tokens.

echo "
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
  name: ai-proxy-advanced
  namespace: kong
  annotations:
    kubernetes.io/ingress.class: kong
    konghq.com/tags: ''
config:
  balancer:
    algorithm: lowest-usage
    tokens_count_strategy: prompt-tokens
  targets:
  - model:
      name: gpt-4
      provider: openai
      options:
        max_tokens: 512
        temperature: 1.0
    route_type: llm/v1/chat
    auth:
      header_name: Authorization
      header_value: Bearer $OPENAI_API_KEY
  - model:
      name: gpt-4o-mini
      provider: openai
      options:
        max_tokens: 512
        temperature: 1.0
    route_type: llm/v1/chat
    auth:
      header_name: Authorization
      header_value: Bearer $OPENAI_API_KEY
plugin: ai-proxy-advanced
" | kubectl apply -f -

Copied!

Next, apply the KongPlugin resource by annotating the httproute or ingress resource:

kubectl annotate -n kong httproute  konghq.com/plugins=ai-proxy-advanced

Copied!

kubectl annotate -n kong ingress  konghq.com/plugins=ai-proxy-advanced

Copied!

Prerequisite: Configure your Personal Access Token

terraform {
  required_providers {
    konnect = {
      source  = "kong/konnect"
    }
  }
}

provider "konnect" {
  personal_access_token = "$KONNECT_TOKEN"
  server_url            = "https://us.api.konghq.com/"
}

Copied!

Add the following to your Terraform configuration to create a Konnect Gateway Plugin:

resource "konnect_gateway_plugin_ai_proxy_advanced" "my_ai_proxy_advanced" {
  enabled = true

  config = {

    balancer = {
      algorithm = "lowest-usage"
      tokens_count_strategy = "prompt-tokens"
    }
    targets = [
      {

        model = {
          name = "gpt-4"
          provider = "openai"

          options = {
            max_tokens = 512
            temperature = 1.0
          }
        }
        route_type = "llm/v1/chat"

        auth = {
          header_name = "Authorization"
          header_value = "Bearer var.openai_api_key"
        }
      }, 

      {

        model = {
          name = "gpt-4o-mini"
          provider = "openai"

          options = {
            max_tokens = 512
            temperature = 1.0
          }
        }
        route_type = "llm/v1/chat"

        auth = {
          header_name = "Authorization"
          header_value = "Bearer var.openai_api_key"
        }
      }    ]
  }
  tags = []

  control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
  route = {
    id = konnect_gateway_route.my_route.id
  }
}

Copied!

This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value.

variable "openai_api_key" {
  type = string
}

Copied!

Add this section to your kong.yaml configuration file:

kong.yaml

Copied!

_format_version: "3.0"
plugins:
  - name: ai-proxy-advanced
    consumer: consumerName|Id
    config:
      balancer:
        algorithm: lowest-usage
        tokens_count_strategy: prompt-tokens
      targets:
      - model:
          name: gpt-4
          provider: openai
          options:
            max_tokens: 512
            temperature: 1.0
        route_type: llm/v1/chat
        auth:
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
      - model:
          name: gpt-4o-mini
          provider: openai
          options:
            max_tokens: 512
            temperature: 1.0
        route_type: llm/v1/chat
        auth:
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}

Make sure to replace the following placeholders with your own values:

consumerName|Id: The id or name of the consumer the plugin configuration will target.

Make the following request:

curl -i -X POST http://localhost:8001/consumers/{consumerName|Id}/plugins/ \
    --header "Accept: application/json" \
    --header "Content-Type: application/json" \
    --data '
    {
      "name": "ai-proxy-advanced",
      "config": {
        "balancer": {
          "algorithm": "lowest-usage",
          "tokens_count_strategy": "prompt-tokens"
        },
        "targets": [
          {
            "model": {
              "name": "gpt-4",
              "provider": "openai",
              "options": {
                "max_tokens": 512,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            }
          },
          {
            "model": {
              "name": "gpt-4o-mini",
              "provider": "openai",
              "options": {
                "max_tokens": 512,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            }
          }
        ]
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

consumerName|Id: The id or name of the consumer the plugin configuration will target.

Make the following request:

curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/consumers/{consumerId}/plugins/ \
    --header "accept: application/json" \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer $KONNECT_TOKEN" \
    --data '
    {
      "name": "ai-proxy-advanced",
      "config": {
        "balancer": {
          "algorithm": "lowest-usage",
          "tokens_count_strategy": "prompt-tokens"
        },
        "targets": [
          {
            "model": {
              "name": "gpt-4",
              "provider": "openai",
              "options": {
                "max_tokens": 512,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            }
          },
          {
            "model": {
              "name": "gpt-4o-mini",
              "provider": "openai",
              "options": {
                "max_tokens": 512,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            }
          }
        ]
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

region: Geographic region where your Kong Konnect is hosted and operates.
KONNECT_TOKEN: Your Personal Access Token (PAT) associated with your Konnect account.
controlPlaneId: The id of the control plane.
consumerId: The id of the consumer the plugin configuration will target.

See the Konnect API reference to learn about region-specific URLs and personal access tokens.

echo "
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
  name: ai-proxy-advanced
  namespace: kong
  annotations:
    kubernetes.io/ingress.class: kong
    konghq.com/tags: ''
config:
  balancer:
    algorithm: lowest-usage
    tokens_count_strategy: prompt-tokens
  targets:
  - model:
      name: gpt-4
      provider: openai
      options:
        max_tokens: 512
        temperature: 1.0
    route_type: llm/v1/chat
    auth:
      header_name: Authorization
      header_value: Bearer $OPENAI_API_KEY
  - model:
      name: gpt-4o-mini
      provider: openai
      options:
        max_tokens: 512
        temperature: 1.0
    route_type: llm/v1/chat
    auth:
      header_name: Authorization
      header_value: Bearer $OPENAI_API_KEY
plugin: ai-proxy-advanced
" | kubectl apply -f -

Copied!

Next, apply the KongPlugin resource by annotating the KongConsumer resource:

kubectl annotate -n kong kongconsumer CONSUMER_NAME konghq.com/plugins=ai-proxy-advanced

Copied!

Prerequisite: Configure your Personal Access Token

terraform {
  required_providers {
    konnect = {
      source  = "kong/konnect"
    }
  }
}

provider "konnect" {
  personal_access_token = "$KONNECT_TOKEN"
  server_url            = "https://us.api.konghq.com/"
}

Copied!

Add the following to your Terraform configuration to create a Konnect Gateway Plugin:

resource "konnect_gateway_plugin_ai_proxy_advanced" "my_ai_proxy_advanced" {
  enabled = true

  config = {

    balancer = {
      algorithm = "lowest-usage"
      tokens_count_strategy = "prompt-tokens"
    }
    targets = [
      {

        model = {
          name = "gpt-4"
          provider = "openai"

          options = {
            max_tokens = 512
            temperature = 1.0
          }
        }
        route_type = "llm/v1/chat"

        auth = {
          header_name = "Authorization"
          header_value = "Bearer var.openai_api_key"
        }
      }, 

      {

        model = {
          name = "gpt-4o-mini"
          provider = "openai"

          options = {
            max_tokens = 512
            temperature = 1.0
          }
        }
        route_type = "llm/v1/chat"

        auth = {
          header_name = "Authorization"
          header_value = "Bearer var.openai_api_key"
        }
      }    ]
  }
  tags = []

  control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
  consumer = {
    id = konnect_gateway_consumer.my_consumer.id
  }
}

Copied!

This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value.

variable "openai_api_key" {
  type = string
}

Copied!

Add this section to your kong.yaml configuration file:

kong.yaml

Copied!

_format_version: "3.0"
plugins:
  - name: ai-proxy-advanced
    consumer_group: consumerGroupName|Id
    config:
      balancer:
        algorithm: lowest-usage
        tokens_count_strategy: prompt-tokens
      targets:
      - model:
          name: gpt-4
          provider: openai
          options:
            max_tokens: 512
            temperature: 1.0
        route_type: llm/v1/chat
        auth:
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
      - model:
          name: gpt-4o-mini
          provider: openai
          options:
            max_tokens: 512
            temperature: 1.0
        route_type: llm/v1/chat
        auth:
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}

Make sure to replace the following placeholders with your own values:

consumerGroupName|Id: The id or name of the consumer group the plugin configuration will target.

Make the following request:

curl -i -X POST http://localhost:8001/consumer_groups/{consumerGroupName|Id}/plugins/ \
    --header "Accept: application/json" \
    --header "Content-Type: application/json" \
    --data '
    {
      "name": "ai-proxy-advanced",
      "config": {
        "balancer": {
          "algorithm": "lowest-usage",
          "tokens_count_strategy": "prompt-tokens"
        },
        "targets": [
          {
            "model": {
              "name": "gpt-4",
              "provider": "openai",
              "options": {
                "max_tokens": 512,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            }
          },
          {
            "model": {
              "name": "gpt-4o-mini",
              "provider": "openai",
              "options": {
                "max_tokens": 512,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            }
          }
        ]
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

consumerGroupName|Id: The id or name of the consumer group the plugin configuration will target.

Make the following request:

curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/consumer_groups/{consumerGroupId}/plugins/ \
    --header "accept: application/json" \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer $KONNECT_TOKEN" \
    --data '
    {
      "name": "ai-proxy-advanced",
      "config": {
        "balancer": {
          "algorithm": "lowest-usage",
          "tokens_count_strategy": "prompt-tokens"
        },
        "targets": [
          {
            "model": {
              "name": "gpt-4",
              "provider": "openai",
              "options": {
                "max_tokens": 512,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            }
          },
          {
            "model": {
              "name": "gpt-4o-mini",
              "provider": "openai",
              "options": {
                "max_tokens": 512,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            }
          }
        ]
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

region: Geographic region where your Kong Konnect is hosted and operates.
KONNECT_TOKEN: Your Personal Access Token (PAT) associated with your Konnect account.
controlPlaneId: The id of the control plane.
consumerGroupId: The id of the consumer group the plugin configuration will target.

See the Konnect API reference to learn about region-specific URLs and personal access tokens.

echo "
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
  name: ai-proxy-advanced
  namespace: kong
  annotations:
    kubernetes.io/ingress.class: kong
    konghq.com/tags: ''
config:
  balancer:
    algorithm: lowest-usage
    tokens_count_strategy: prompt-tokens
  targets:
  - model:
      name: gpt-4
      provider: openai
      options:
        max_tokens: 512
        temperature: 1.0
    route_type: llm/v1/chat
    auth:
      header_name: Authorization
      header_value: Bearer $OPENAI_API_KEY
  - model:
      name: gpt-4o-mini
      provider: openai
      options:
        max_tokens: 512
        temperature: 1.0
    route_type: llm/v1/chat
    auth:
      header_name: Authorization
      header_value: Bearer $OPENAI_API_KEY
plugin: ai-proxy-advanced
" | kubectl apply -f -

Copied!

Next, apply the KongPlugin resource by annotating the KongConsumerGroup resource:

kubectl annotate -n kong kongconsumergroup CONSUMERGROUP_NAME konghq.com/plugins=ai-proxy-advanced

Copied!

Prerequisite: Configure your Personal Access Token

terraform {
  required_providers {
    konnect = {
      source  = "kong/konnect"
    }
  }
}

provider "konnect" {
  personal_access_token = "$KONNECT_TOKEN"
  server_url            = "https://us.api.konghq.com/"
}

Copied!

Add the following to your Terraform configuration to create a Konnect Gateway Plugin:

resource "konnect_gateway_plugin_ai_proxy_advanced" "my_ai_proxy_advanced" {
  enabled = true

  config = {

    balancer = {
      algorithm = "lowest-usage"
      tokens_count_strategy = "prompt-tokens"
    }
    targets = [
      {

        model = {
          name = "gpt-4"
          provider = "openai"

          options = {
            max_tokens = 512
            temperature = 1.0
          }
        }
        route_type = "llm/v1/chat"

        auth = {
          header_name = "Authorization"
          header_value = "Bearer var.openai_api_key"
        }
      }, 

      {

        model = {
          name = "gpt-4o-mini"
          provider = "openai"

          options = {
            max_tokens = 512
            temperature = 1.0
          }
        }
        route_type = "llm/v1/chat"

        auth = {
          header_name = "Authorization"
          header_value = "Bearer var.openai_api_key"
        }
      }    ]
  }
  tags = []

  control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
  consumer_group = {
    id = konnect_gateway_consumer_group.my_consumer_group.id
  }
}

Copied!

This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value.

variable "openai_api_key" {
  type = string
}

Copied!

AI Proxy Advanced

Load balancing: Lowest-usage

Prerequisites

Environment variables

Set up the plugin

Help us make these docs great!

Still need help