Load balancing: Lowest-usagev3.8+

Configure the plugin to use two OpenAI models and route requests based on the number of tokens in the prompt.

The lowest-usage algorithm distributes requests to the model with the lowest usage volume. By default, the usage is calculated based on the total number of tokens in the prompt and in the response. However, you can customize this using the config.balancer.tokens_count_strategy parameter. You can use:

  • prompt-tokens to only count the tokens in the prompt
  • completion-tokens to only count the tokens in the response
  • total-tokens to count both tokens in the prompt and in the response
  • v3.10+ cost to count the cost of the tokens.
    You must set the cost parameter in each model configuration to use this strategy and log_statistics must be enabled.

Prerequisites

  • An OpenAI account

Environment variables

  • OPENAI_API_KEY: The API key to use to connect to OpenAI.

Set up the plugin

Something wrong?

Help us make these docs great!

Kong Developer docs are open source. If you find these useful and want to make them better, contribute today!