With API usage reporting, you can:
- Identify which services are slow or have high error rates
- Monitor request volume and throughput over time
- Analyze payload sizes for clients and upstream services
The following table shows which API usage metrics you can view:
|
Metric
|
Category
|
Description
|
|
Request Count
|
Count
|
Total number of API calls within the selected time frame. This includes requests that were rejected due to rate limiting, failed authentication, and so on.
|
|
Requests per Minute
|
Rate
|
Number of API calls per minute within the selected time frame.
|
|
Response Latency
|
Latency
|
The time, in milliseconds, it takes to process an API request from start to finish. Users can choose from average (avg) or specific percentiles (p99, p95, and p50). For example, a 99th percentile response latency of 10 milliseconds means that 99 out of 100 requests were completed in under 10 ms from the time the request was received to when the response was sent.
|
|
Upstream Latency
|
Latency
|
The amount of time, in milliseconds, that Kong Gateway was waiting for the first byte of the upstream service response. Users can select between different percentiles (p99, p95, and p50). For example, a 99th percentile latency of 10 milliseconds means that 99 out of 100 requests took less than 10 ms from the moment the request was sent to the upstream service to when the first byte of the response was received.
|
|
Kong latency
|
Latency
|
The time, in milliseconds, spent within Kong Gateway processing a request, excluding upstream response time. Users can choose from different percentiles (p99, p95, and p50). For example, a 99th percentile Kong latency of 10 milliseconds means that 99 out of 100 requests took less than 10 ms to be processed in Kong Gateway before reaching the upstream service.
|
|
Request Size
|
Size
|
The size of the request payload received from the client, in bytes. Users can select between the total sum or different percentiles (p99, p95, and p50). For example, a 99th percentile request size of 100 bytes means that the payload size for every 1 in 100 requests was at least 100 bytes.
|
|
Response Size
|
Size
|
The size of the response payload returned to the client, in bytes. Users can select between the total sum or different percentiles (p99, p95, and p50). For example, a 99th percentile response size of 100 bytes means that the payload size for every 1 in 100 response back to the original caller was at least 100 bytes.
|
|
Error Rate
|
Percentage
|
The percentage of failed API requests. This includes requests that return HTTP 4xx and 5xx status codes.
|
Observability allows you to monitor and optimize your LLM usage by providing detailed insights into objects such as token consumption, costs, and latency.
With LLM usage reporting, you can:
- Track token consumption: Monitor the number of tokens processed by the different LLM models you have configured.
- Understand costs: Gain visibility into the costs associated with your LLM providers.
- Measure latency: Analyze the latency involved in processing LLM requests.
The following table shows which LLM usage metrics you can view:
|
Attribute
|
Unit
|
Description
|
|
Completion Tokens
|
Count
|
Completion tokens are any tokens that the model generates in response to an input.
|
|
Prompt Tokens
|
Count
|
Prompt tokens are the number of tokens in the prompt that are input into the model.
|
|
Total Tokens
|
Count
|
Sum of all tokens used in a single request to the model. It includes both the tokens in the input (prompt) and the tokens generated by the model (completion).
|
|
Time per Tokens
|
Number
|
Average time in milliseconds to generate a token. Calculated as LLM latency divided by the number of tokens.
|
|
Costs
|
Cost
|
Represents the resulting costs for a request. Final costs = (total number of prompt tokens × input cost per token) + (total number of completion tokens × output cost per token) + (total number of prompt tokens × embedding cost per token).
|
|
Response Model
|
String
|
Represents which AI model was used to process the prompt by the AI provider.
|
|
Request Model
|
String
|
Represents which AI model was used to process the prompt.
|
|
Provider Name
|
String
|
Represents which AI provider was used to process the prompt.
|
|
Plugin ID
|
String
|
Represents the UUID of the plugin.
|
|
LLM Latency
|
Latency
|
Total time taken to receive a full response after a request sent from Kong (LLM latency + connection time).
|
|
Embeddings Latency
|
Latency
|
Time taken to generate the vector for the prompt string.
|
|
Fetch Latency
|
Latency
|
Total time taken to return a cache.
|
|
Cache Status
|
String
|
Shows if the response comes directly from the upstream or not. Possible values: hit or Miss.
|
|
Embeddings Model
|
String
|
AI providers may have multiple embedding models. This represents the model used for the embeddings.
|
|
Embeddings Provider
|
String
|
Provider used for generating embeddings.
|
|
Embeddings Token
|
Count
|
Tokens input into the model for embeddings.
|
|
Embeddings Cost
|
Cost
|
Cost of caching.
|
|
Cost Savings
|
Cost
|
Cost savings from cache.
|
With platform usage reporting, you can:
- Track the number of control planes and data plane nodes in your organization
- Monitor Gateway Services, Routes, and plugins per control plane
- View Consumer counts across your realms and control planes
The following table shows which platform usage metrics you can view:
|
Metric
|
Category
|
Description
|
|
Control plane count
|
Count
|
Number of control planes in your organization.
|
|
Node count
|
Count
|
Number of data plane nodes in your organization.
You can also filter this metric by data plane node version.
|
|
Service count
|
Count
|
Number of Gateway Services in your control plane.
|
|
Route count
|
Count
|
Number of Routes in your control plane or associated with a specific Gateway Service.
|
|
Plugin count
|
Count
|
Number of plugins in your control plane.
These can also be filtered by plugin scope and name.
|
|
Consumer count
|
Count
|
Number of Consumers in your realm or control plane.
|
Agentic usage tracks analytics data for agent-to-agent (A2A) traffic that flows through the AI A2A Proxy plugin, such as agent tool use and agent MCP calls.
You must configure the AI A2A Proxy plugin before analytics display in Konnect Explorer.
With agentic usage reporting, you can:
- See how many times a tool was called
- View the most called tools
- See which tools are returning errors
- View the latency for tools
The following table shows the agentic usage-specific metrics you can view:
|
Metric
|
Category
|
Description
|
|
A2A latency
|
Latency
|
The amount of time, in milliseconds, that Kong Gateway was waiting for the first byte of the agent’s response. Users can select average (avg).
|
|
MCP Response Size
|
Size
|
The size of the response payload returned to Kong Gateway from the MCP server, in bytes. Users can select the total sum.
|
|
A2A Response Size
|
Size
|
The size of the response payload returned to Kong Gateway from an agent, in bytes. Users can select the total sum.
|