Overview

The AI Rate Limiting Advanced plugin provides rate limiting for the providers used by any AI plugins. The AI Rate Limiting plugin extends the Rate Limiting Advanced plugin.

This plugin uses the token data returned by the LLM provider to calculate the costs of queries. The same HTTP request can vary greatly in cost depending on the calculation of the LLM providers.

A common pattern to protect your AI API is to analyze and assign costs to incoming queries, then rate limit the consumer’s cost for a given time window and providers.

You can also create a generic prompt rate limit using the request prompt provider.

Did this doc help?

Something wrong?

Help us make these docs great!

Kong Developer docs are open source. If you find these useful and want to make them better, contribute today!