By default, when transparent proxying is enabled, every data plane proxy receives configuration for every other data plane proxy in the mesh.
In large meshes, a data plane proxy typically communicates with only a small number of services.
Defining that list of services can dramatically improve Kong Mesh performance.
The benefits are:
The control plane generates a much smaller XDS configuration (fewer Clusters, Listeners, and so on), reducing CPU and memory usage.
Smaller configurations reduce network bandwidth.
Envoy maintains fewer Clusters and Listeners, resulting in fewer statistics and lower memory usage.
This feature only works with MeshTrafficPermission. If you’re using TrafficPermission, migrate to MeshTrafficPermission before enabling this feature, otherwise all traffic flow may stop.
Enabling this flag causes Kong Mesh to compute a dependency graph between services and generate XDS configuration that allows communication only between services permitted to reach each other (those whose effective action is not deny).
In the example below, service b can only be called by service a. There is no reason to compute or distribute configuration for service b to any other service, since they are not permitted to communicate with it.
The recommended migration path is to start with a coarse-grained MeshTrafficPermission targeting a MeshSubset with k8s.kuma.io/namespace, then narrow down to individual services as needed.
The default value works well when kuma-cp and the PostgreSQL database are deployed in the same data center or cloud region.
If you’re using a more distributed topology, such as hosting kuma-cp on-premises with PostgreSQL as a cloud service, the default timeout may not be sufficient.
Kong Mesh’s control plane exposes pprof endpoints for profiling and debugging kuma-cp performance.
To enable debugging endpoints, set KUMA_DIAGNOSTICS_DEBUG_ENDPOINTS=true before starting kuma-cp, then retrieve profiling data using one of the following methods:
go tool pprof http://$CONTROL_PLANE_IP:5680/debug/pprof/profile?seconds=30
The Kubernetes client uses client-level throttling to avoid overwhelming the kube-apiserver. In deployments with more than 2,000 services in a single Kubernetes cluster, the volume of resource updates can hit this limit. It’s generally safe to raise this limit, since kube-apiserver has its own throttling mechanism. To adjust client throttling:
runtime: kubernetes: clientConfig: qps: ... # maximum requests per second the Kubernetes client is allowed to make burstQps: ... # maximum burst requests per second the Kubernetes client is allowed to make
Kong Mesh modifies Kubernetes resources through reconciliation. Each resource type has its own work queue, and the control plane adds reconciliation tasks to that queue. In deployments with more than 2,000 services in a single Kubernetes cluster, the Pod reconciliation queue can grow and slow down Pod updates. To increase the number of concurrent Pod reconciliation tasks:
runtime: kubernetes: controllersConcurrency: podController: ... # maximum concurrent Pod reconciliations
Envoy’s worker thread count can be tuned. The mechanism differs by deployment type.
By default, Envoy sets concurrency based on the container’s CPU resource limit. For example, a limit of 7000m results in 7 worker threads. On Kubernetes, concurrency is capped between 2 and 10 by default. To exceed that limit, use the kuma.io/sidecar-proxy-concurrency annotation:
On Linux, Envoy starts with the --cpuset-threads flag by default, using the cpuset size to determine worker thread count. When not available, it falls back to the number of hardware threads. Use the --concurrency flag when starting kuma-dp to override this:
Kong Mesh supports Incremental xDS, a new model for exchanging configuration between the control plane and Envoy.
Instead of sending the entire configuration on each update, the control plane sends only the changes. This reduces CPU and memory usage on sidecars during updates, but may slightly increase load on the control plane, which must maintain state and compute diffs.
This feature is especially beneficial for sidecars that don’t use reachableBackends or reachableServices.
Enable it for the entire deployment with KUMA_EXPERIMENTAL_DELTA_XDS=true, or for an individual sidecar (including Ingress and Egress):
This section covers internal Kong Mesh control plane implementation details and is intended for advanced users.
The main task of the control plane is to provide configuration to data planes. When a data plane connects to the control plane, the control plane starts a new Goroutine that runs a reconciliation process at a configurable interval (one second by default). You can customize this interval with the KUMA_XDS_SERVER_DATAPLANE_CONFIGURATION_REFRESH_INTERVAL parameter. During reconciliation, all data planes and policies are fetched and matched. The resulting Envoy configuration, including policies and available service endpoints, is generated and sent only if it has changed.
This process can be CPU-intensive with a large number of data planes. Increasing the interval reduces control plane load at the cost of higher config propagation latency. For example, setting it to five seconds means that when you apply a policy or a service instance changes state, the control plane will generate and distribute the new configuration within five seconds.
For high-traffic systems, stale endpoint data for that long may not be acceptable. In that case, use passive or active health checks.
To reduce storage load, a cache shares fetch results across concurrent reconciliation Goroutines for multiple data planes. The default expiration time for cache entries is five seconds, but you can customize it using the KUMA_STORE_CACHE_EXPIRATION_TIME parameter.
This value should not exceed KUMA_XDS_SERVER_DATAPLANE_CONFIGURATION_REFRESH_INTERVAL, otherwise the control plane will build Envoy config from stale data.