Autoscale workloads with Datadog

Uses: Kong Gateway Operator
Related Documentation
TL;DR

Deploy a DataPlaneMetricsExtension to collect metrics (like latency) from a target service, expose those metrics on the /metrics endpoint, and configure the operator to reference this data for scaling decisions.

Prerequisites

If you don’t have a Konnect account, you can get started quickly with our onboarding wizard.

  1. The following Konnect items are required to complete this tutorial:
    • Personal access token (PAT): Create a new personal access token by opening the Konnect PAT page and selecting Generate Token.
  2. Set the personal access token as an environment variable:

    export KONNECT_TOKEN='YOUR KONNECT TOKEN'
    
  1. Install the Gateway API CRDs before installing Kong Ingress Controller.

    kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.3.0/standard-install.yaml
    
  2. Create a Gateway and GatewayClass instance to use.

echo "
apiVersion: v1
kind: Namespace
metadata:
  name: kong
---
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: kong
  annotations:
    konghq.com/gatewayclass-unmanaged: 'true'
spec:
  controllerName: konghq.com/gateway-operator
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: kong
spec:
  gatewayClassName: kong
  listeners:
  - name: proxy
    port: 80
    protocol: HTTP
    allowedRoutes:
      namespaces:
         from: All
" | kubectl apply -n kong -f -
  1. Add the Kong Helm charts:

    helm repo add kong https://charts.konghq.com
    helm repo update
    
  2. Create a kong namespace:

    kubectl create namespace kong --dry-run=client -o yaml | kubectl apply -f -
    
  3. Install Kong Ingress Controller using Helm:

    helm upgrade --install kgo kong/gateway-operator -n kong-system --create-namespace  \
      --set image.tag=1.5 \
      --set kubernetes-configuration-crds.enabled=true \
      --set env.ENABLE_CONTROLLER_KONNECT=true
    
  4. Apply a KongLicense. This assumes that your license is available in ./license.json

    echo "
    apiVersion: configuration.konghq.com/v1alpha1
    kind: KongLicense
    metadata:
     name: kong-license
    rawLicenseString: '$(cat ./license.json)'
    " | kubectl apply -f -
    

This how-to requires some Kubernetes services to be available in your cluster. These services will be used by the resources created in this how-to.

kubectl apply -f https://developer.konghq.com/manifests/kic/command-service.yaml -n kong

This how-to also requires 1 pre-configured route:

Autoscaling Workloads

This tutorial shows how to autoscale workloads based on Service latency. The command service created in the prerequisites allows us to inject an artificial delay in to responses to trigger autoscaling.

Create a DataPlaneMetricsExtension

The DataPlaneMetricsExtension allows Kong Gateway Operator to monitor Service latency and expose it on the /metrics endpoint.

  1. Create a DataPlaneMetricsExtension that points to the command service:

     echo '
     kind: DataPlaneMetricsExtension
     apiVersion: gateway-operator.konghq.com/v1alpha1
     metadata:
       name: kong
       namespace: kong
     spec:
       serviceSelector:
         matchNames:
         - name: command
       config:
         latency: true
     ' | kubectl apply -f -
    
  2. Create a GatewayConfiguration that uses it:

     echo '
     kind: GatewayConfiguration
     apiVersion: gateway-operator.konghq.com/v1beta1
     metadata:
       name: kong
       namespace: kong
     spec:
       controlPlaneOptions:
         extensions:
         - kind: DataPlaneMetricsExtension
           group: gateway-operator.konghq.com
           name: kong
     ' | kubectl apply -f -
    
  3. Patch the GatewayClass to use the config:

     kubectl patch -n kong --type=json gatewayclass kong -p='[
         {
             "op":"add",
             "path":"/spec/parametersRef",
             "value":{
                     "group": "gateway-operator.konghq.com",
                     "kind": "GatewayConfiguration",
                     "name": "kong",
                     "namespace": "kong",
             }
         }
     ]'
    

Kong Gateway Operator can be integrated with Datadog Metrics in order to use Kong Gateway latency metrics to autoscale workloads based on their metrics.

Install Datadog in your Kubernetes cluster

Datadog API and application keys

To install Datadog agents in your cluster you will need a Datadog API key and an application key. Please refer to this Datadog manual page to obtain those.

Installing

To install Datadog in your cluster, you can follow this guide or use the following values.yaml:

echo '
datadog:
  kubelet:
    tlsVerify: false

clusterAgent:
  enabled: true
  # Enable the metricsProvider to be able to scale based on metrics in Datadog
  metricsProvider:
    # Set this to true to enable Metrics Provider
    enabled: true
    # Enable usage of DatadogMetric CRD to autoscale on arbitrary Datadog queries
    useDatadogMetrics: true

  prometheusScrape:
    enabled: true
    serviceEndpoints: true

agents:
  containers:
    agent:
      env:
      - name: DD_HOSTNAME
        valueFrom:
          fieldRef:
            fieldPath: spec.nodeName
' > values.yaml

To install Datadog’s helm chart:

helm repo add datadog https://helm.datadoghq.com
helm repo update
helm install -n datadog datadog --set datadog.apiKey=${DD_APIKEY} --set datadog.AppKey=${DD_APPKEY} datadog/datadog

Send traffic

To trigger autoscaling, run the following command in a new terminal window. This will cause the underlying deployment to sleep for 100ms on each request and thus increase the average response time to that value.

while curl -k "http://$(kubectl get gateway kong -o custom-columns='name:.status.addresses[0].value' --no-headers -n default)/echo/shell?cmd=sleep%200.1" ; do sleep 1; done

Keep this running while we move on to next steps.

Annotate Kong Gateway Operator with Datadog checks config

Add the following annotation on Kong Gateway Operator’s Pod to tell Datadog how to scrape Kong Gateway Operator’s metrics:

POD_NAME=$(kubectl get pods -n kong-system -o custom-columns='name:.metadata.name' --no-headers)
kubectl annotate -n kong-system pod $POD_NAME \
  'ad.datadoghq.com/kube-rbac-proxy.checks={
    "openmetrics": {
      "instances": [
        {
          "prometheus_url": "https://%%host%%:8080/metrics",
          "namespace": "autoscaling",
          "metrics": [
            "kong_upstream_latency_ms_bucket",
            "kong_upstream_latency_ms_sum",
            "kong_upstream_latency_ms_count"
          ],
          "send_histograms_buckets": true,
          "send_distribution_buckets": true
        }
      ]
    }
  }'

After applying the above you should see avg:autoscaling.kong_upstream_latency_ms{service:echo} metrics in your Datadog Metrics explorer.

Expose Datadog metrics to Kubernetes

To use an external metric in HorizontalPodAutoscaler, we need to configure the Datadog agent to expose it.

There are several ways to achieve this but we’ll use a Kubernetes native way and use the DatadogMetric CRD:

echo '
apiVersion: datadoghq.com/v1alpha1
kind: DatadogMetric
metadata:
  name: echo-kong-upstream-latency-ms-avg
  namespace: default
spec:
  query: autoscaling.kong_upstream_latency_ms{service:echo} ' | kubectl apply -f -

You can check the status of DatadogMetric with:

kubectl get -n default datadogmetric echo-kong-upstream-latency-ms-avg -w

Which should look like this:

NAME                                ACTIVE   VALID   VALUE               REFERENCES         UPDATE TIME
echo-kong-upstream-latency-ms-avg   True     True    104.46194839477539                     38s

You should be able to get the metric via Kubernetes External Metrics API within 30 seconds:

kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/datadogmetric@default:echo-kong-upstream-latency-ms-avg" | jq
{
  "kind": "ExternalMetricValueList",
  "apiVersion": "external.metrics.k8s.io/v1beta1",
  "metadata": {},
  "items": [
    {
      "metricName": "datadogmetric@default:echo-kong-upstream-latency-ms-avg",
      "metricLabels": null,
      "timestamp": "2024-03-08T18:03:02Z",
      "value": "104233138021n"
    }
  ]
}

Note: 104233138021n is a Kubernetes way of expressing numbers as integers. Since value here represents latency in milliseconds, it is approximately equivalent to 104.23ms.

Use DatadogMetric in HorizontalPodAutoscaler

When we have the metric already available in Kubernetes External API we can use it in HPA like so:

The echo-kong-upstream-latency-ms-avg DatadogMetric from default namespace can be used by the Kubernetes HorizontalPodAutoscaler to autoscale our workload: specifically the echo Deployment.

The following manifest will scale the underlying echo Deployment between 1 and 10 replicas, trying to keep the average latency across last 30s at 40ms.

echo '
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: echo
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: echo
  minReplicas: 1
  maxReplicas: 10
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 1
      policies:
      - type: Percent
        value: 100
        periodSeconds: 10
    scaleUp:
      stabilizationWindowSeconds: 1
      policies:
      - type: Percent
        value: 100
        periodSeconds: 2
      - type: Pods
        value: 4
        periodSeconds: 2
      selectPolicy: Max

  metrics:
  - type: External
    external:
      metric:
        name: datadogmetric@default:echo-kong-upstream-latency-ms-avg
      target:
        type: Value
        value: 40 ' | kubectl apply -f -

When everything is configured correctly, DatadogMetric’s status will update and it will now have a reference to the HorizontalPodAutoscaler:

Get the DatadogMetric using kubectl:

kubectl get -n default datadogmetric echo-kong-upstream-latency-ms-avg -w

You will see the HPA reference in the output:

NAME                                ACTIVE   VALID   VALUE               REFERENCES         UPDATE TIME
echo-kong-upstream-latency-ms-avg   True     True    104.46194839477539  hpa:default/echo  38s

If everything went well we should see the SuccessfulRescale events:

12m          Normal   SuccessfulRescale   horizontalpodautoscaler/echo   New size: 2; reason: Service metric kong_upstream_latency_ms_30s_average above target
12m          Normal   SuccessfulRescale   horizontalpodautoscaler/echo   New size: 4; reason: Service metric kong_upstream_latency_ms_30s_average above target
12m          Normal   SuccessfulRescale   horizontalpodautoscaler/echo   New size: 8; reason: Service metric kong_upstream_latency_ms_30s_average above target
12m          Normal   SuccessfulRescale   horizontalpodautoscaler/echo   New size: 10; reason: Service metric kong_upstream_latency_ms_30s_average above target

# Then when latency drops
4s          Normal   SuccessfulRescale   horizontalpodautoscaler/echo   New size: 1; reason: All metrics below target
Something wrong?

Help us make these docs great!

Kong Developer docs are open source. If you find these useful and want to make them better, contribute today!