Rate Limiting Advanced

Enterprise only

Overview Examples Configuration reference Changelog

Window types

The Rate Limiting Advanced plugin supports the following window types:

Fixed window: Fixed windows consist of buckets that are statically assigned to a definitive time range. Each request is mapped to only one fixed window based on its timestamp and will affect only that window’s counters.
Sliding window (default): A sliding window tracks the number of hits assigned to a specific key (such as an IP address, consumer, credential) within a given time window, taking into account previous hit rates to create a dynamically calculated rate. The default (and recommended) sliding window type ensures a resource is not consumed at a higher rate than what is configured.

Learn more about how the different window types work for rate limiting plugins.

Multiple limits and window sizes

An arbitrary number of limits or window sizes can be applied per plugin instance. This allows you to create multiple rate limiting windows (for example, rate limit per minute and per hour, and per any arbitrary window size). Because of limitations with Kong Gateway’s plugin configuration interface, each nth limit will apply to each nth window size. For example:

_format_version: "3.0"
plugins:
  - name: rate-limiting-advanced
    config:
      limit:
      - 10
      - 100
      window_size:
      - 60
      - 3600

curl -i -X POST http://localhost:8001/plugins/ \
    --header "Accept: application/json" \
    --header "Content-Type: application/json" \
    --data '
    {
      "name": "rate-limiting-advanced",
      "config": {
        "limit": [
          10,
          100
        ],
        "window_size": [
          60,
          3600
        ]
      }
    }
    '

Make the following request:

curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/plugins/ \
    --header "accept: application/json" \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer $KONNECT_TOKEN" \
    --data '
    {
      "name": "rate-limiting-advanced",
      "config": {
        "limit": [
          10,
          100
        ],
        "window_size": [
          60,
          3600
        ]
      }
    }
    '

echo "
apiVersion: configuration.konghq.com/v1
kind: KongClusterPlugin
metadata:
  name: rate-limiting-advanced
  namespace: kong
  annotations:
    kubernetes.io/ingress.class: kong
  labels:
    global: 'true'
config:
  limit:
  - 10
  - 100
  window_size:
  - 60
  - 3600
plugin: rate-limiting-advanced
" | kubectl apply -f -

resource "konnect_gateway_plugin_rate_limiting_advanced" "my_rate_limiting_advanced" {
  enabled = true

  config = {
    limit = [10, 100]
    window_size = [60, 3600]
  }

  control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
}

This example applies two rate limiting policies, one of which will trip when 10 hits have been counted in 60 seconds, or the other when 100 hits have been counted in 3600 seconds.

The number of configured window sizes and limits parameters must be equal, otherwise you will get the following error:

You must provide the same number of windows and limits

Strategies

The Rate Limiting Advanced plugin supports three rate limiting strategies: local, cluster, and redis.

This is controlled by the config.strategy parameter.

Strategy	Description	Pros	Cons
`local`	Counters are stored in-memory on the node.	Minimal performance impact.	Less accurate. Unless there’s a consistent-hashing load balancer in front of Kong Gateway, it diverges when scaling the number of nodes.
`cluster`	Counters are stored in the Kong Gateway data store and shared across nodes.	Accurate¹, no extra components to support.	Each request forces a read and a write on the data store. Therefore, relatively, the biggest performance impact. Not supported in hybrid mode or Konnect deployments.
`redis`	Counters are stored on a Redis server and shared across nodes.	Accurate¹, less performance impact than a `cluster` policy.	Needs a Redis installation. Bigger performance impact than a `local` policy.

[1]: Only when config.sync_rate option is set to 0 (synchronous behavior).

Two common use cases for rate limiting are:

Every transaction counts: The highest level of accuracy is needed. An example is a transaction with financial consequences.
Backend protection: Accuracy is not as relevant. The requirement is only to protect backend services from overloading that’s caused either by specific users or by attacks.

Every transaction counts

In this scenario, because accuracy is important, the local policy is not an option. Consider the support effort you might need for Redis, and then choose either cluster or redis.

You could start with the cluster policy, and move to redis if performance reduces drastically.

If using a very high sync frequency, use redis. Very high sync frequencies with cluster mode are not scalable and not recommended. The sync frequency becomes higher when the sync_rate setting is a lower number - for example, a sync_rate of 0.1 is a much higher sync frequency (10 counter syncs per second) than a sync_rate of 1 (1 counter sync per second).

You can calculate what is considered a very high sync rate in your environment based on your topology, number of plugins, their sync rates, and tolerance for loose rate limits.

Together, the interaction between sync rate and window size affects how accurately the plugin can determine cluster-wide traffic. For example, the following table represents the worst-case scenario where a full sync interval’s worth of data hasn’t yet propagated across nodes:

Property	Formula or config location	Value
Window size in seconds	Value set in `config.window_size`	5
Limit (in window)	Value set in `config.limit`	1000
Sync rate (interval)	Value set in `config.sync_rate`	0.5
Number of nodes (>1)	–	10
Estimated load balanced requests-per-second (RPS) to a node	Limit / Window size / Number of nodes	1000 / 5 / 10 = 20
Max potential lag in cluster count for a given node/s	Estimated load balanced RPS * Sync rate	20 * 0.5 = 10
Cluster-wide max potential overage/s	Max potential lag * Number of nodes	10 * 10 = 100
Cluster-wide max potential overage/s as a percentage	Cluster-wide max potential overage / Limit	100 / 1000 = 10%
Effective worst case cluster-wide requests allowed at window size	Limit * Cluster-wide max potential overage	1000 + 100 = 1100

If you choose to switch strategies, note that you can’t port the existing usage metrics from the Kong Gateway data store to Redis. This might not be a problem with short-lived metrics (for example, seconds or minutes) but if you use metrics with a longer time frame (for example, months), plan your switch carefully.

Backend protection

If accuracy is less important, choose the local policy. You might need to experiment a little before you get a setting that works for your scenario. As the cluster scales to more nodes, more user requests are handled. When the cluster scales down, the probability of false negatives increases. Make sure to adjust your rate limits when scaling.

For example, if a user can make 100 requests every second, and you have an equally balanced 5-node Kong Gateway cluster, you can set the local limit to 30 requests every second. If you see too many false negatives, increase the limit.

To minimize inaccuracies, consider using a consistent-hashing load balancer in front of Kong Gateway. The load balancer ensures that a user is always directed to the same Kong Gateway node, which reduces inaccuracies and prevents scaling problems.

Fallback from Redis

When the redis strategy is used and a Kong Gateway node is disconnected from Redis, the rate-limiting-advanced plugin will fall back to local. This can happen when the Redis server is down or the connection to Redis broken. Kong Gateway keeps the local counters for rate limiting and syncs with Redis once the connection is re-established. Kong Gateway will still rate limit, but the Kong Gateway nodes can’t sync the counters. As a result, users will be able to perform more requests than the limit, but there will still be a limit per node.

Limit by IP address

If limiting by IP address, it’s important to understand how Kong Gateway determines the IP address of an incoming request.

The IP address is extracted from the request headers sent to Kong Gateway by downstream clients. Typically, these headers are named X-Real-IP or X-Forwarded-For.

By default, Kong Gateway uses the header name X-Real-IP to identify the client’s IP address. If your environment requires a different header, you can specify this by setting the real_ip_header Nginx property. Depending on your network setup, you may also need to configure the trusted_ips Nginx property to include the load balancer IP address. This ensures that Kong Gateway correctly interprets the client’s IP address, even when the request passes through multiple network layers.

Headers sent to the client

When this plugin is enabled, Kong Gateway sends some additional headers back to the client, indicating the state of the rate limiting policies in place:

Header	Description
RateLimit-Limit	Allowed limit in the timeframe.
RateLimit-Remaining	Number of available requests remaining.
RateLimit-Reset	The time remaining, in seconds, until the rate limit quota is reset.
X-RateLimit-Limit-Second	The time limit, in number of seconds.
X-RateLimit-Limit-Minute	The time limit, in number of minutes.
X-RateLimit-Limit-Day	The time limit, in number of days.
X-RateLimit-Limit-Month	The time limit, in number of months.
X-RateLimit-Limit-Year	The time limit, in number of years.
X-RateLimit-Remaining-Second	The number of seconds still left in the time frame.
X-RateLimit-Remaining-Minute	The number of minutes still left in the time frame.
X-RateLimit-Remaining-Day	The number of days still left in the time frame.
X-RateLimit-Remaining-Month	The number of months still left in the time frame.
X-RateLimit-Remaining-Year	The number of years still left in the time frame.
Retry-After	This header appears on `429` errors, indicating how long the upstream service is expected to be unavailable to the client. When using `window_type: sliding` and `RateLimit-Reset`, `Retry-After` may increase due to the rate calculation for the sliding window.

You can optionally hide the limit and remaining headers with the config.hide_client_headers option.

If more than one limit is set, the plugin returns multiple time limit headers. For example:

X-RateLimit-Limit-Second: 5
X-RateLimit-Remaining-Second: 4
X-RateLimit-Limit-Minute: 10
X-RateLimit-Remaining-Minute: 9

If any of the limits are reached, the plugin returns an HTTP/1.1 429 status code to the client with the following JSON body:

{ "message": "API rate limit exceeded" }

The headers RateLimit-Limit, RateLimit-Remaining, and RateLimit-Reset are based on the Internet-Draft RateLimit Header Fields for HTTP and may change in the future to respect specification updates.

Rate limiting for Consumer Groups

You can use the Consumer Groups entity to manage custom rate limiting configurations for subsets of Consumers.

You can see an example of this in the guide on enforcing rate limiting tiers with the Rate Limiting Advanced plugin.