Load balancing: Semanticv3.8+

Configure semantic load balancing with the AI Proxy Advanced plugin. To set up semantic routing, you must configure the following parameters:

  • config.embeddings to define the model to use to match the model description and the prompts.
  • config.vectordb to define the vector database parameters. Only Redis is supported, so you need a Redis instance running in your environment.
  • config.targets[].description to define the description to be matched with the prompts.

This configuration routes incoming requests to the most relevant OpenAI model based on the content of the request:

  • If the request is related to code completions, it will be routed to the gpt-35-turbo model.
  • If the request is about IT support, it will be routed to the gpt-4o model.
  • All other requests, which don’t match the above categories, will be handled by the gpt-4o-mini model, serving as a catch-all for general queries.

Prerequisites

  • An OpenAI account

  • A Redis instance running

Environment variables

  • OPENAI_API_KEY: The API key to use to connect to OpenAI.

Set up the plugin

Something wrong?

Help us make these docs great!

Kong Developer docs are open source. If you find these useful and want to make them better, contribute today!