Kong Gateway is designed to handle large volumes of request traffic and to proxy requests with minimal latency. This reference offers recommendations on sizing for resource allocation based on expected Kong Gateway configuration and traffic patterns.
Kong Gateway resource sizing guidelines
Scaling dimensions
Kong Gateway measures performance in the following dimensions:
Performance dimension |
Measured in |
Performance limited by… |
Description |
---|---|---|---|
Latency | Microseconds or milliseconds |
Memory-bound
Add more database caching memory to decrease latency. |
The delay between the downstream client sending a request and receiving a response. Increasing the number of Routes and plugins in a Kong Gateway cluster increases the amount of latency that’s added to each request. |
Throughput | Seconds or minutes | CPU-bound Scale Kong Gateway vertically or horizontally to increase throughput. | The number of requests that Kong Gateway can process in a given time span. |
When all other factors remain the same, decreasing the latency for each request increases the maximum throughput in Kong Gateway. This is because there is less CPU time spent handling each request, and more CPU available for processing traffic as a whole. Kong Gateway is designed to scale horizontally to add more overall compute power for configurations that add substantial latency into requests, while needing to meet specific throughput requirements.
Performance benchmarking and optimization as a whole is a complex exercise that must account for a variety of factors, including those external to Kong Gateway, such as the behavior of upstream services, or the health of the underlying hardware on which Kong Gateway is running.
General resource guidelines
These recommendations are a baseline guide only. For performance-critical environments, you should conduct specific tuning or benchmarking efforts.
Hybrid mode with large number of entities v3.5+
When Kong Gateway is operating in hybrid mode with a large number of
entities (like Routes and Gateway Services), it can benefit from enabling dedicated_config_processing
.
When enabled, certain CPU-intensive steps of the data plane reconfiguration operation are offloaded to a dedicated worker process. This reduces proxy latency during reconfigurations at the cost of a slight increase in memory usage. The benefits of this are most apparent with configurations of more than 1,000 entities.
Kong Gateway resources
Kong Gateway is designed to operate in a variety of deployment environments. It has no minimum system requirements to operate.
Resource requirements vary substantially based on configuration. The following high-level matrices offer a guideline for determining system requirements based on overall configuration and performance requirements.
The following table provides rough usage requirement estimates based on simplified examples with latency and throughput requirements on a per-node basis:
Size |
Number of configured entities |
Latency requirements |
Throughput requirements |
Use cases |
---|---|---|---|---|
Development | < 100 | < 100 ms | < 500 RPS |
|
Small | < 1000 | < 20 ms | < 2500 RPS |
|
Medium | < 10000 | < 10 ms | < 10000 RPS |
|
Large | < 50000+ | < 10 ms | < 10000 RPS |
|
Database resources
We do not provide any specific numbers for database sizing because it depends on your particular setup. Sizing varies based on:
- Traffic
- Number of nodes
-
Enabled features
For example: Rate limiting uses a database or Redis
- Number and rate of change of entities
- The rate at which Kong Gateway processes are started and restarted within the cluster
- The size of Kong Gateway’s in-memory cache
Kong Gateway intentionally relies on the database as little as possible. To access configuration, Kong Gateway only reads configuration from the database when a node first starts or configuration for a given entity changes.
Everything in the database is meant to be read infrequently and held in memory as long as possible. Therefore, database resource requirements are lower than those of compute environments running Kong Gateway.
Query patterns are typically simple and follow schema indexes. Provision sufficient database resources in order to handle spiky query patterns.
You can adjust datastore settings
in kong.conf
to keep database access minimal. If the database is down for maintenance, see the in-memory caching section or
keep Kong Gateway operational. If you choose to keep the database
operational during downtime, Vitals data is not written to the
database during this time.
Cluster resource allocations
Based on the expected size and demand of the cluster, we recommend the following resource allocations as a starting point:
Size |
CPU |
RAM |
Typical cloud instance sizes |
---|---|---|---|
Development | 1-2 cores | 2-4 GB |
AWS: t3.medium
GCP: n1-standard-1 Azure: Standard A1 v2 |
Small | 1-2 cores | 2-4 GB |
AWS: t3.medium
GCP: n1-standard-1 Azure: Standard A1 v2 |
Medium | 2-4 cores | 4-8 GB |
AWS: m5.large
GCP: n1-standard-4 Azure: Standard A1 v4 |
Large | 8-16 cores | 16-32 GB |
AWS: c5.xlarge
GCP: n1-highcpu-16 Azure: F8s v2 |
We strongly discourage using throttled cloud instance types (such as the
AWS t2
or t3
series of machines) in large clusters, because CPU throttling is detrimental to Kong Gateway’s performance. We also recommend
testing and verifying the bandwidth availability for a given instance class.
Bandwidth requirements for Kong Gateway depend on the shape and volume
of traffic flowing through the cluster.
In-memory caching
We recommend defining the largest mem_cache_size
possible
while still providing adequate resources to the operating system and any other
processes running adjacent to Kong Gateway. This configuration allows
Kong Gateway to take maximum advantage of the in-memory cache, and
reduce the number of trips to the database.
Each Kong Gateway worker process maintains its own memory allocations, and must be accounted for when provisioning memory. By default, one worker process runs per number of available CPU cores. We recommend allocating about 500MB of memory per worker process.
For example, on a machine with 4 CPU cores and 8 GB of RAM available, we recommend allocating between 4-6 GB to cache using mem_cache_size
, depending on what other processes are running alongside Kong Gateway.
Plugin queues
Several Kong Gateway plugins use internal, in-memory queues to reduce the number of concurrent requests to an upstream server under high load conditions and provide buffering during temporary network and upstream outages.
These plugins include:
The queue.max_entries
plugin configuration parameter determines how many entries can be waiting in a given plugin queue.
The default value of 10,000 for queue.max_entries
should provide for enough buffering in many installations while keeping
the maximum memory usage of queues at reasonable levels.
Once this limit is reached, the oldest entry is removed when a new entry is queued.
For larger configurations, we recommend experimentally determining the memory requirements of queues by running Kong Gateway in a test environment. You can force plugin queues to reach configured limits by observing its memory consumption while plugin upstream servers are unavailable. Most plugins use one queue per plugin instance, with the exception of the HTTP Log plugin, which uses one queue per log server upstream configuration.
Next steps
- See Kong Gateway’s performance testing benchmark results and conduct your own performance tuning tests