LLM usage reporting

Uses: Kong Gateway Advanced Analytics AI Gateway
Related Documentation
OpenAPI Specifications
Incompatible with
on-prem

Advanced Analytics allows you to monitor and optimize your LLM usage by providing detailed insights into objects such as token consumption, costs, and latency.

With LLM usage reporting, you can:

  • Track token consumption: Monitor the number of tokens processed by the different LLM models you have configured.
  • Understand costs: Gain visibility into the costs associated with your LLM providers.
  • Measure latency: Analyze the latency involved in processing LLM requests.

To use this feature, navigate to the Explorer dashboard and switch between API usage and LLM usage using the dataset dropdown. Metrics and groupings will dynamically adjust based on the selected dataset.

Metrics

Traffic metrics provide insight into which of your services are being used and how they are responding. Within a single report, you have the flexibility to choose one or multiple metrics from the same category.

Attribute

Unit

Description

Completion Tokens Count Completion tokens are any tokens that the model generates in response to an input.
Prompt Tokens Count Prompt tokens are the number of tokens in the prompt that are input into the model.
Total Tokens Count Sum of all tokens used in a single request to the model. It includes both the tokens in the input (prompt) and the tokens generated by the model (completion).
Time per Tokens Number Average time in milliseconds to generate a token. Calculated as LLM latency divided by the number of tokens.
Costs Cost Represents the resulting costs for a request. Final costs = (total number of prompt tokens × input cost per token) + (total number of completion tokens × output cost per token) + (total number of prompt tokens × embedding cost per token).
Response Model String Represents which AI model was used to process the prompt by the AI provider.
Request Model String Represents which AI model was used to process the prompt.
Provider Name String Represents which AI provider was used to process the prompt.
Plugin ID String Represents the UUID of the plugin.
LLM Latency Latency Total time taken to receive a full response after a request sent from Kong (LLM latency + connection time).
Embeddings Latency Latency Time taken to generate the vector for the prompt string.
Fetch Latency Latency Total time taken to return a cache.
Cache Status String Shows if the response comes directly from the upstream or not. Possible values: hit or Miss.
Embeddings Model String AI providers may have multiple embedding models. This represents the model used for the embeddings.
Embeddings Provider String Provider used for generating embeddings.
Embeddings Token Count Tokens input into the model for embeddings.
Embeddings Cost Cost Cost of caching.
Cost Savings Cost Cost savings from cache.

Time intervals

The time frame selector controls the time frame of data visualized, which indirectly controls the granularity of the data. For example, the “5M” selection displays five minutes in one-second resolution data, while longer time frames display minute, hour, or days resolution data.

  • Relative time frames are dynamic and the report captures a snapshot of data relative to when a user views the report.
  • Custom time frames are static and the report captures a snapshot of data during the specified time frame. You can see the exact range below the time frame selector. For example:

     Jan 26, 2023 12:00 AM - Feb 01, 2023 12:00 AM (PST)
    

The following table describes the time intervals you can select:

Interval

Aggregation increment frequency

Notes

Last 15 minutes 1 minute Data is aggregated in one minute increments.
Last hour 1 minute Data is aggregated in one minute increments.
Last six hours 1 minute Data is aggregated in one minute increments.
Last 12 hours 1 hour Data is aggregated in one hour increments.
Last 24 hours 1 hour Data is aggregated in one hour increments.
Last seven days 1 hour Data is aggregated in one hour increments.
Last 30 days Daily Data is aggregated in daily increments.
Current week 1 hour Logs any traffic in the current calendar week.
Current month 1 hour Logs any traffic in the current calendar month.
Previous week 1 hour Logs any traffic in the previous calendar week.
Previous month Daily Logs any traffic in the previous calendar month.

FAQs

  • Application
  • Cache Status
  • Consumer
  • Control Plane
  • Control Plane Group
  • Embeddings Model
  • Embeddings Provider
  • Provider
  • Request Model
  • Response Model
  • Route
  • Save as a Report: This function creates a new custom report based on your current view, allowing you to revisit these specific insights at a later time.
  • Export as CSV: If you prefer to analyze your data using other tools, you can download the current view as a CSV file, making it portable and ready for further analysis elsewhere.
Something wrong?

Help us make these docs great!

Kong Developer docs are open source. If you find these useful and want to make them better, contribute today!