LLM usage reporting

Uses: Kong Gateway Advanced Analytics AI Gateway

Metrics

Traffic metrics provide insight into which of your services are being used and how they are responding. Within a single report, you have the flexibility to choose one or multiple metrics from the same category.

Attribute	Unit	Description
Completion Tokens	Count	Completion tokens are any tokens that the model generates in response to an input.
Prompt Tokens	Count	Prompt tokens are the number of tokens in the prompt that are input into the model.
Total Tokens	Count	Sum of all tokens used in a single request to the model. It includes both the tokens in the input (prompt) and the tokens generated by the model (completion).
Time per Tokens	Number	Average time in milliseconds to generate a token. Calculated as LLM latency divided by the number of tokens.
Costs	Cost	Represents the resulting costs for a request. Final costs = (total number of prompt tokens × input cost per token) + (total number of completion tokens × output cost per token) + (total number of prompt tokens × embedding cost per token).
Response Model	String	Represents which AI model was used to process the prompt by the AI provider.
Request Model	String	Represents which AI model was used to process the prompt.
Provider Name	String	Represents which AI provider was used to process the prompt.
Plugin ID	String	Represents the UUID of the plugin.
LLM Latency	Latency	Total time taken to receive a full response after a request sent from Kong (LLM latency + connection time).
Embeddings Latency	Latency	Time taken to generate the vector for the prompt string.
Fetch Latency	Latency	Total time taken to return a cache.
Cache Status	String	Shows if the response comes directly from the upstream or not. Possible values: `hit` or `Miss`.
Embeddings Model	String	AI providers may have multiple embedding models. This represents the model used for the embeddings.
Embeddings Provider	String	Provider used for generating embeddings.
Embeddings Token	Count	Tokens input into the model for embeddings.
Embeddings Cost	Cost	Cost of caching.
Cost Savings	Cost	Cost savings from cache.

Time intervals

The time frame selector controls the time frame of data visualized, which indirectly controls the granularity of the data. For example, the “5M” selection displays five minutes in one-second resolution data, while longer time frames display minute, hour, or days resolution data.

Relative time frames are dynamic and the report captures a snapshot of data relative to when a user views the report.
Custom time frames are static and the report captures a snapshot of data during the specified time frame. You can see the exact range below the time frame selector. For example:
```
 Jan 26, 2023 12:00 AM - Feb 01, 2023 12:00 AM (PST)
```

The following table describes the time intervals you can select:

Interval	Aggregation increment frequency	Notes
Last 15 minutes	1 minute	Data is aggregated in one minute increments.
Last hour	1 minute	Data is aggregated in one minute increments.
Last six hours	1 minute	Data is aggregated in one minute increments.
Last 12 hours	1 hour	Data is aggregated in one hour increments.
Last 24 hours	1 hour	Data is aggregated in one hour increments.
Last seven days	1 hour	Data is aggregated in one hour increments.
Last 30 days	Daily	Data is aggregated in daily increments.
Current week	1 hour	Logs any traffic in the current calendar week.
Current month	1 hour	Logs any traffic in the current calendar month.
Previous week	1 hour	Logs any traffic in the previous calendar week.
Previous month	Daily	Logs any traffic in the previous calendar month.

FAQs

What data can I collect LLM data from?

Application
Cache Status
Consumer
Control Plane
Control Plane Group
Embeddings Model
Embeddings Provider
Provider
Request Model
Response Model
Route

What can I do after customizing an Explorer dashboard?

Save as a Report: This function creates a new custom report based on your current view, allowing you to revisit these specific insights at a later time.
Export as CSV: If you prefer to analyze your data using other tools, you can download the current view as a CSV file, making it portable and ready for further analysis elsewhere.

LLM usage reporting

Metrics

Time intervals

FAQs

Help us make these docs great!

Still need help