Performance fine-tuning

Uses: Kong Mesh

Reachable services

By default, when transparent proxying is enabled, every data plane proxy receives configuration for every other data plane proxy in the mesh. In large meshes, a data plane proxy typically communicates with only a small number of services. Defining that list of services can dramatically improve Kong Mesh performance.

The benefits are:

The control plane generates a much smaller XDS configuration (fewer Clusters, Listeners, and so on), reducing CPU and memory usage.
Smaller configurations reduce network bandwidth.
Envoy maintains fewer Clusters and Listeners, resulting in fewer statistics and lower memory usage.

For more information, see Transparent proxying.

Configuration trimming with MeshTrafficPermission

This feature only works with MeshTrafficPermission. If you’re using TrafficPermission, migrate to MeshTrafficPermission before enabling this feature, otherwise all traffic flow may stop.

The problem described in Reachable services can also be mitigated by defining MeshTrafficPermission policies and configuring a zone control plane with KUMA_EXPERIMENTAL_AUTO_REACHABLE_SERVICES=true.

Enabling this flag causes Kong Mesh to compute a dependency graph between services and generate XDS configuration that allows communication only between services permitted to reach each other (those whose effective action is not deny).

In the example below, service b can only be called by service a. There is no reason to compute or distribute configuration for service b to any other service, since they are not permitted to communicate with it.

apiVersion: kuma.io/v1alpha1
kind: MeshTrafficPermission
metadata:
  namespace: kuma-system
  name: mtp-b
spec:
  targetRef:
    kind: MeshService
    name: b
  from:
    - targetRef:
        kind: MeshService
        name: a
      default:
        action: Allow

Copied!

You can combine autoReachableServices with reachable services, but reachable services takes precedence.

Requests from services that don’t have access to a given service fail with a connection closed error:

root@second-test-server:/# curl -v first-test-server:80
*   Trying [IP]:80...
* Connected to first-test-server ([IP]) port 80 (#0)
> GET / HTTP/1.1
> Host: first-test-server
> User-Agent: curl/7.81.0
> Accept: */*
>
* Empty reply from server
* Closing connection 0
curl: (52) Empty reply from server

Copied!

The sections below highlight the most important aspects of this feature. For more information, see the MADR.

Supported targetRef kinds

The following kinds affect graph generation and performance:

All levels of MeshService
Top-level MeshSubset and MeshServiceSubset with k8s.kuma.io/namespace, k8s.kuma.io/service-name, k8s.kuma.io/service-port tags
from level MeshSubset and MeshServiceSubset with all tags

A MeshTrafficPermission using any other kind won’t affect performance. For example:

apiVersion: kuma.io/v1alpha1
kind: MeshTrafficPermission
metadata:
  name: mtp-mesh-to-mesh
  namespace: kong-mesh-system
  labels:
    kuma.io/mesh: default
spec:
  targetRef:
    kind: MeshSubset
    tags:
      customTag: true
  from:
  - targetRef:
      kind: Mesh
    default:
      action: Allow

Copied!

type: MeshTrafficPermission
mesh: default
name: mtp-mesh-to-mesh
spec:
  targetRef:
    kind: MeshSubset
    tags:
      customTag: true
  from:
  - targetRef:
      kind: Mesh
    default:
      action: Allow

Copied!

Adjust konnect_mesh_control_plane.my_meshcontrolplane.id and konnect_mesh.my_mesh.name according to your current configuration.

resource "konnect_mesh_traffic_permission" "mtp_mesh_to_mesh" {
  provider = konnect-beta
  type = "MeshTrafficPermission"
  name = "mtp-mesh-to-mesh"
  spec = {
    target_ref = {
      kind = "MeshSubset"
      tags = {
        custom_tag = "true"
      }
    }
    from = [
      {
        target_ref = {
          kind = "Mesh"
        }
        default = {
          action = "Allow"
        }
      }
    ]
  }
  labels   = {
  "kuma.io/mesh" = konnect_mesh.my_mesh.name
  }
  cp_id    = konnect_mesh_control_plane.my_meshcontrolplane.id
  mesh     = konnect_mesh.my_mesh.name
}

Copied!

Migration

The recommended migration path is to start with a coarse-grained MeshTrafficPermission targeting a MeshSubset with k8s.kuma.io/namespace, then narrow down to individual services as needed.

PostgreSQL

If you’re using Postgres as a configuration store for Kong Mesh on Universal, the following settings affect control plane performance:

KUMA_STORE_POSTGRES_CONNECTION_TIMEOUT: Connection timeout to the PostgreSQL database (default: 5s).
KUMA_STORE_POSTGRES_MAX_OPEN_CONNECTIONS: Maximum number of open connections to the PostgreSQL database (default: unlimited).

KUMA_STORE_POSTGRES_CONNECTION_TIMEOUT

The default value works well when kuma-cp and the PostgreSQL database are deployed in the same data center or cloud region.

If you’re using a more distributed topology, such as hosting kuma-cp on-premises with PostgreSQL as a cloud service, the default timeout may not be sufficient.

KUMA_STORE_POSTGRES_MAX_OPEN_CONNECTIONS

As more data planes join your meshes, Kong Mesh may need more PostgreSQL connections to fetch configurations and update statuses.

If your PostgreSQL database only permits a small number of concurrent connections, adjust Kong Mesh’s configuration accordingly.

Profiling

Kong Mesh’s control plane exposes pprof endpoints for profiling and debugging kuma-cp performance.

To enable debugging endpoints, set KUMA_DIAGNOSTICS_DEBUG_ENDPOINTS=true before starting kuma-cp, then retrieve profiling data using one of the following methods:

go tool pprof http://$CONTROL_PLANE_IP:5680/debug/pprof/profile?seconds=30

Copied!

curl http://$CONTROL_PLANE_IP:5680/debug/pprof/profile?seconds=30 --output prof.out

Copied!

You can then analyze the profiling data using a tool like Speedscope.

After debugging, disable the debugging endpoints. Anyone with access can execute heap dumps, potentially exposing sensitive data.

Kubernetes client

The Kubernetes client uses client-level throttling to avoid overwhelming the kube-apiserver. In deployments with more than 2,000 services in a single Kubernetes cluster, the volume of resource updates can hit this limit. It’s generally safe to raise this limit, since kube-apiserver has its own throttling mechanism. To adjust client throttling:

runtime:
  kubernetes:
    clientConfig:
      qps: ... # maximum requests per second the Kubernetes client is allowed to make
      burstQps: ... # maximum burst requests per second the Kubernetes client is allowed to make

Copied!

Kubernetes controller manager

Kong Mesh modifies Kubernetes resources through reconciliation. Each resource type has its own work queue, and the control plane adds reconciliation tasks to that queue. In deployments with more than 2,000 services in a single Kubernetes cluster, the Pod reconciliation queue can grow and slow down Pod updates. To increase the number of concurrent Pod reconciliation tasks:

runtime:
  kubernetes:
    controllersConcurrency:
      podController: ... # maximum concurrent Pod reconciliations

Copied!

Envoy

Envoy is the data plane proxy used by Kong Mesh. The following settings let you tune its performance characteristics.

Envoy concurrency tuning

Envoy’s worker thread count can be tuned. The mechanism differs by deployment type.

By default, Envoy sets concurrency based on the container’s CPU resource limit. For example, a limit of 7000m results in 7 worker threads. On Kubernetes, concurrency is capped between 2 and 10 by default. To exceed that limit, use the kuma.io/sidecar-proxy-concurrency annotation:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: demo-app
spec:
  selector:
    matchLabels:
      app: demo-app
  template:
    metadata:
      labels:
        app: demo-app
      annotations:
        kuma.io/sidecar-proxy-concurrency: 55
[...]

Copied!

On Linux, Envoy starts with the --cpuset-threads flag by default, using the cpuset size to determine worker thread count. When not available, it falls back to the number of hardware threads. Use the --concurrency flag when starting kuma-dp to override this:

kuma-dp run \
  [..]
  --concurrency=5

Copied!

Incremental xDS v2.11+

This feature is experimental.

Kong Mesh supports Incremental xDS, a new model for exchanging configuration between the control plane and Envoy.

Instead of sending the entire configuration on each update, the control plane sends only the changes. This reduces CPU and memory usage on sidecars during updates, but may slightly increase load on the control plane, which must maintain state and compute diffs.

This feature is especially beneficial for sidecars that don’t use reachableBackends or reachableServices.

Enable it for the entire deployment with KUMA_EXPERIMENTAL_DELTA_XDS=true, or for an individual sidecar (including Ingress and Egress):

Add the following annotation to the Pod template:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: demo-app
  namespace: kuma-demo
spec:
  ...
  template:
    metadata:
      ...
      annotations:
        kuma.io/xds-transport-protocol-variant: DELTA_GRPC

Copied!

Set the following environment variable when starting the sidecar:

KUMA_DATAPLANE_RUNTIME_ENVOY_XDS_TRANSPORT_PROTOCOL_VARIANT=DELTA_GRPC

Copied!

Snapshot generation

This section covers internal Kong Mesh control plane implementation details and is intended for advanced users.

The main task of the control plane is to provide configuration to data planes. When a data plane connects to the control plane, the control plane starts a new Goroutine that runs a reconciliation process at a configurable interval (one second by default). You can customize this interval with the KUMA_XDS_SERVER_DATAPLANE_CONFIGURATION_REFRESH_INTERVAL parameter. During reconciliation, all data planes and policies are fetched and matched. The resulting Envoy configuration, including policies and available service endpoints, is generated and sent only if it has changed.

This process can be CPU-intensive with a large number of data planes. Increasing the interval reduces control plane load at the cost of higher config propagation latency. For example, setting it to five seconds means that when you apply a policy or a service instance changes state, the control plane will generate and distribute the new configuration within five seconds.

For high-traffic systems, stale endpoint data for that long may not be acceptable. In that case, use passive or active health checks.

To reduce storage load, a cache shares fetch results across concurrent reconciliation Goroutines for multiple data planes. The default expiration time for cache entries is five seconds, but you can customize it using the KUMA_STORE_CACHE_EXPIRATION_TIME parameter.

This value should not exceed KUMA_XDS_SERVER_DATAPLANE_CONFIGURATION_REFRESH_INTERVAL, otherwise the control plane will build Envoy config from stale data.

Performance fine-tuning

Reachable services

Configuration trimming with MeshTrafficPermission

Supported targetRef kinds

Migration

PostgreSQL

KUMA_STORE_POSTGRES_CONNECTION_TIMEOUT

KUMA_STORE_POSTGRES_MAX_OPEN_CONNECTIONS

Profiling

Kubernetes client

Kubernetes controller manager

Envoy

Envoy concurrency tuning

Incremental xDS v2.11+

Snapshot generation

Help us make these docs great!

Still need help?