AI Proxy Advanced

AI License Required

Load balancing: Least-connectionsv3.8+

v3.13+ Configure the plugin to use two OpenAI models and route requests to the backend with the highest spare capacity based on in-flight connection counts.

In this example, both models have equal weight (2), so requests are distributed based on which backend has fewer active connections. The algorithm automatically routes new requests to backends with more spare capacity, making it particularly effective when backends have varying response times.

Prerequisites

  • An OpenAI account

Environment variables

  • OPENAI_API_KEY: The API key to use to connect to OpenAI.

Set up the plugin

Help us make these docs great!

Kong Developer docs are open source. If you find these useful and want to make them better, contribute today!