The AI Rate Limiting Advanced plugin provides rate limiting for the providers used by any AI plugins. The AI Rate Limiting plugin extends the Rate Limiting Advanced plugin.
This plugin uses the token data returned by the LLM provider to calculate the costs of queries. The same HTTP request can vary greatly in cost depending on the calculation of the LLM providers.
A common pattern to protect your AI API is to analyze and assign costs to incoming queries, then rate limit the consumer’s cost for a given time window and providers.
You can also create a generic prompt rate limit using the request prompt provider.