OpenTelemetry

Overview Examples Configuration reference Changelog

Collecting telemetry data

There are two ways to set up an OpenTelemetry backend:

Using an OpenTelemetry-compatible backend directly, like Jaeger (v1.35.0+).

All the vendors supported by OpenTelemetry are listed in OpenTelemetry’s Vendor support.
Using the OpenTelemetry Collector, which is middleware that can be used to proxy OpenTelemetry spans to a compatible backend.

You can view all the available OpenTelemetry Collector exporters at open-telemetry/opentelemetry-collector-contrib.

Metrics v3.8+

Metrics are enabled using the contrib version of the OpenTelemetry Collector.

The spanmetrics connector allows you to aggregate traces and provide metrics to any third party observability platform.

To include span metrics for application traces, configure the collector exporters section of the OpenTelemetry Collector configuration file:

connectors:
  spanmetrics:
    dimensions:
      - name: http.method
        default: GET
      - name: http.status_code
      - name: http.route
    exclude_dimensions:
      - status.code
    metrics_flush_interval: 15s
    histogram:
      disable: false

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: []
      exporters: [spanmetrics]
    metrics:
      receivers: [spanmetrics]
      processors: []
      exporters: [otlphttp]

Kong Gateway has a series of built-in tracing instrumentations which are configured by the tracing_instrumentations configuration. Kong Gateway creates a top-level span for each request by default when tracing_instrumentations is enabled.

The top level span has the following attributes:

http.method: HTTP method
http.url: HTTP URL
http.host: HTTP host
http.scheme: HTTP scheme (http or https)
http.flavor: HTTP version
net.peer.ip: Client IP address

For more information, see the Tracing reference.

Propagation

The OpenTelemetry plugin supports propagation of the following header formats:

w3c: W3C trace context
b3 and b3-single: Zipkin headers
jaeger: Jaeger headers
ot: OpenTracing headers
datadog: Datadog headers
aws: v3.4+ AWS X-Ray header
gcp: v3.5+ GCP X-Cloud-Trace-Context header

This plugin offers extensive options for configuring tracing header propagation, providing a high degree of flexibility. You can customize which headers are used to extract and inject tracing context. Additionally, you can configure headers to be cleared after the tracing context extraction process, enabling a high level of customization.

 
flowchart LR
   id1(Original Request) --> Extract
   id1(Original Request) -->|"headers (original)"| Extract
   id1(Original Request) --> Extract
   subgraph ide1 [Headers Propagation]
   Extract --> Clear
   Extract -->|"headers (original)"| Clear
   Extract --> Clear
   Clear -->|"headers (filtered)"| Inject
   end
   Extract -.->|extracted ctx| id2((tracing logic))
   id2((tracing logic)) -.->|updated ctx| Inject
   Inject -->|"headers (updated ctx)"| id3(Updated request)

See the plugin’s configuration reference for a complete overview of the available options and values.

Note: If any of the config.propagation.* configuration options (extract, clear, or inject) are configured, the config.propagation configuration takes precedence over the deprecated header_type parameter. If none of the config.propagation.* configuration options are set, the header_type parameter is still used to determine the propagation behavior.

In Kong Gateway 3.6 or earlier, the plugin detects the propagation format from the headers and will use the appropriate format to propagate the span context. If no appropriate format is found, the plugin will fallback to the default format, which is w3c.

OTLP exporter

The OpenTelemetry plugin implements the OTLP/HTTP exporter, which uses Protobuf payloads encoded in binary format and is sent via an HTTP/1.1.

config.connect_timeout, config.read_timeout, and config.send_timeout are used to set the timeouts for the HTTP request.

config.batch_span_count and config.batch_flush_delay are used to set the maximum number of spans and the delay between two consecutive batches.

Create a custom span

The OpenTelemetry plugin is built on top of the Kong Gateway tracing PDK. You can customize the spans and add your own spans through the universal tracing PDK.

Create a file named custom-span.lua with the following content:

-- Modify the root span
local root_span = kong.tracing.active_span()
root_span:set_attribute("custom.attribute", "custom value")

-- Create a custom span
local span = kong.tracing.start_span("custom-span")

-- Append attributes
span:set_attribute("custom.attribute", "custom value")
      
-- Close the span
span:finish()

Apply the Lua code with the Post-function plugin using a cURL file upload:

curl -i -X POST http://localhost:8001/plugins \
  -F "name=post-function" \
  -F "config.access[1]=@custom-span.lua"

Logging v3.8+

This plugin supports OpenTelemetry Logging, which can be configured as described in the configuration reference to export logs in OpenTelemetry format to an OTLP-compatible backend.

Log scopes

Two different kinds of logs are exported:

Request logs are directly associated with requests. These application logs are produced during the request lifecycle. For example, these could be logs generated by a plugin during its Access or Response phase, or by Kong Gateway’s core logic.
Non-request logs aren’t directly associated with a request. They’re produced outside the request lifecycle. For example, they could be logs generated asynchronously (in a timer) or during a worker’s startup.

Log level

Logs are recorded based on the log level that is configured for Kong Gateway. If a log is emitted with a level that is lower than the configured log level, it is not recorded or exported.

Note: Not all logs are guaranteed to be recorded. Logs that aren’t recorded include those produced by the Nginx master process and low-level errors produced by Nginx. Operators are expected to still capture the Nginx error.log file (which always includes all such logs) in addition to using this feature, to avoid losing any details that might be useful for deeper troubleshooting.

Log entry

Each log entry adheres to the OpenTelemetry Logs Data Model. The available information depends on the log scope and on whether tracing is enabled for this plugin.

Every log entry includes the following fields:

Timestamp: Time when the event occurred.
ObservedTimestamp: Time when the event was observed.
SeverityText: The severity text (log level).
SeverityNumber: Numerical value of the severity.
Body: The error log line.
Resource: Configurable resource attributes.
InstrumentationScope: Metadata that describes Kong Gateway’s data emitter.
Attributes: Additional information about the event.
- introspection.source: Full path of the file that emitted the log.
- introspection.current.line: Line number that emitted the log.

In addition to the above, request-scoped logs include:

Attributes: Additional information about the event.
- request.id: Kong Gateway’s request ID.

In addition to the above, when tracing is enabled, request-scoped logs include:

TraceID: Request trace ID.
SpanID: Request span ID.
TraceFlags: W3C trace flag.

Logging for custom plugins

The custom plugin PDK kong.telemetry.log module lets you configure OTLP logging for a custom plugin. The module records a structured log entry, which is reported via the OpenTelemetry plugin.

Queuing

The OpenTelemetry plugin uses internal queues to decouple the production of log entries from their transmission to the upstream log server.

With queuing, request information is put in a configurable queue before being sent in batches to the upstream server. This has the following benefits:

Reduces any possible concurrency on the upstream server
Helps deal with temporary outages of the upstream server due to network or administrative changes
Can reduce resource usage both in Kong Gateway and on the upstream server by collecting multiple entries from the queue in one request

Note: Because queues are structural elements for components in Kong Gateway, they only live in the main memory of each worker process and are not shared between workers. Therefore, queued content isn’t preserved under abnormal operational situations, like power loss or unexpected worker process shutdown due to memory shortage or program errors.

You can configure several parameters for queuing:

Parameters	Description
Queue capacity limits: `config.queue.max_entries` `config.queue.max_bytes` `config.queue.max_batch_size`	Configure sizes for various aspects of the queue: maximum number of entries, batch size, and queue size in bytes. When a queue reaches the maximum number of entries queued and another entry is enqueued, the oldest entry in the queue is deleted to make space for the new entry. The queue code provides warning log entries when it reaches a capacity threshold of 80% and when it starts to delete entries from the queue. It also writes log entries when the situation normalizes.
Timer usage: `config.queue.concurrency_limit`	Only one timer is used to start queue processing in the background. You can add more if needed. Once the queue is empty, the timer handler terminates and a new timer is created as soon as a new entry is pushed onto the queue.
Retry logic: `config.queue.initial_retry_delay` `config.queue.max_coalescing_delay` `config.queue.max_retry_delay` `config.queue.max_retry_time`	If a queue fails to process, the queue library can automatically retry processing it if the failure is temporary (for example, if there are network problems or upstream unavailability). Before retrying, the library waits for the amount of time specified by the `initial_retry_delay` parameter. This wait time is doubled every time the retry fails, until it reaches the maximum wait time specified by the `max_retry_time` parameter.

When a Kong Gateway shutdown is initiated, the queue is flushed. This allows Kong Gateway to shut down even if it was waiting for new entries to be batched, ensuring upstream servers can be contacted.

Queues are not shared between workers and queuing parameters are scoped to one worker. For whole-system capacity planning, the number of workers needs to be considered when setting queue parameters.

Trace IDs in serialized logs v3.5+

When the OpenTelemetry plugin is configured along with a plugin that uses the Log Serializer, the trace ID of each request is added to the key trace_id in the serialized log output.

The value of this field is an object that can contain different formats of the current request’s trace ID. In case there are multiple tracing headers in the same request, the trace_id field includes one trace ID format for each different header format, as in the following example:

"trace_id": {
  "w3c": "4bf92f3577b34da6a3ce929d0e0e4736",
  "datadog": "11803532876627986230"
},

Troubleshooting

The OpenTelemetry spans are printed to the console when the log level is set to debug in the Kong Gateway configuration file.

The following is an example of the debug logs output:

2022/06/02 15:28:42 [debug] 650#0: *111 [lua] instrumentation.lua:302: runloop_log_after(): [tracing] collected 6 spans:
Span #1 name=GET /wrk duration=1502.994944ms attributes={"http.url":"/wrk","http.method":"GET","http.flavor":1.1,"http.host":"127.0.0.1","http.scheme":"http","net.peer.ip":"172.18.0.1"}
Span #2 name=rewrite phase: opentelemetry duration=0.391936ms
Span #3 name=router duration=0.013824ms
Span #4 name=access phase: cors duration=1500.824576ms
Span #5 name=cors: heavy works duration=1500.709632ms attributes={"username":"kongers"}
Span #6 name=balancer try #1 duration=0.99328ms attributes={"net.peer.ip":"104.21.11.162","net.peer.port":80}

Known issues

Only supports the HTTP protocols (http/https) of Kong Gateway.
May impact the performance of Kong Gateway. We recommend setting the sampling rate (tracing_sampling_rate) via the Kong Gateway configuration file when using the OpenTelemetry plugin.