AI Gateway audit log reference

Uses: AI Gateway Kong Gateway

Log details

Each AI plugin returns a set of tokens. Log entries include the following details:

Property	Description
`ai.$PLUGIN_NAME.payload.request`	The request payload.
`ai.$PLUGIN_NAME.payload.response`	The response payload.
`ai.$PLUGIN_NAME.usage.prompt_token`	The number of tokens used for prompting.
`ai.$PLUGIN_NAME.usage.completion_token`	The number of tokens used for completion.
`ai.$PLUGIN_NAME.usage.total_tokens`	The total number of tokens used.
`ai.$PLUGIN_NAME.usage.cost`	The total cost of the request (input and output cost).
`ai.$PLUGIN_NAME.usage.time_per_token`	v3.8+ The average time to generate an output token, in milliseconds.
`ai.$PLUGIN_NAME.meta.request_model`	The model used for the AI request.
`ai.$PLUGIN_NAME.meta.provider_name`	The name of the AI service provider.
`ai.$PLUGIN_NAME.meta.response_model`	The model used for the AI response.
`ai.$PLUGIN_NAME.meta.plugin_id`	The unique identifier of the plugin.
`ai.$PLUGIN_NAME.meta.llm_latency`	v3.8+ The time, in milliseconds, it took the LLM provider to generate the full response.
`ai.$PLUGIN_NAME.cache.cache_status`	v3.8+ The cache status. This can be `Hit`, `Miss`, `Bypass` or `Refresh`.
`ai.$PLUGIN_NAME.cache.fetch_latency`	v3.8+ The time, in milliseconds, it took to return a cache response.
`ai.$PLUGIN_NAME.cache.embeddings_provider`	v3.8+ For semantic caching, the provider used to generate the embeddings.
`ai.$PLUGIN_NAME.cache.embeddings_model`	v3.8+ For semantic caching, the model used to generate the embeddings.
`ai.$PLUGIN_NAME.cache.embeddings_latency`	v3.8+ For semantic caching, the time taken to generate the embeddings.

The following example shows a structured AI Gateway log entry:

"ai": {
    "payload": { "request": "$OPTIONAL_PAYLOAD_REQUEST" },
    "$PLUGIN_NAME_1": {
      "payload": { "response": "$OPTIONAL_PAYLOAD_RESPONSE" },
      "usage": {
        "prompt_token": 28,
        "total_tokens": 48,
        "completion_token": 20,
        "cost": 0.0038,
        "time_per_token": 133
      },
      "meta": {
        "request_model": "command",
        "provider_name": "cohere",
        "response_model": "command",
        "plugin_id": "546c3856-24b3-469a-bd6c-f6083babd2cd",
        "llm_latency": 2670
      }
    },
    "$PLUGIN_NAME_2": {
      "payload": { "response": "$OPTIONAL_PAYLOAD_RESPONSE" },
      "usage": {
        "prompt_token": 89,
        "total_tokens": 145,
        "completion_token": 56,
        "cost": 0.0012,
        "time_per_token": 87
      },
      "meta": {
        "request_model": "gpt-35-turbo",
        "provider_name": "azure",
        "response_model": "gpt-35-turbo",
        "plugin_id": "5df193be-47a3-4f1b-8c37-37e31af0568b",
        "llm_latency": 4927
      }
    }
  }

If you’re using the AI Semantic Cache plugin, AI Gateway logs include additional fields under the cache object for each plugin entry. These fields provide insight into cache behavior—such as whether a response was served from cache, how long it took to fetch, and which embedding provider and model were used if applicable.

The following example shows how cache-related metadata appears alongside usage and model details in a structured AI log entry:

"ai": {
    "payload": { "request": "$OPTIONAL_PAYLOAD_REQUEST_" },
    "$PLUGIN_NAME_1": {
      "payload": { "response": "$OPTIONAL_PAYLOAD_RESPONSE" },
      "usage": {
        "prompt_token": 28,
        "total_tokens": 48,
        "completion_token": 20,
        "cost": 0.0038,
        "time_per_token": 133
      },
      "meta": {
        "request_model": "command",
        "provider_name": "cohere",
        "response_model": "command",
        "plugin_id": "546c3856-24b3-469a-bd6c-f6083babd2cd",
        "llm_latency": 2670
      },
      "cache": {
        "cache_status": "Hit",
        "fetch_latency": 21
      }
    },
    "$PLUGIN_NAME_2": {
      "payload": { "response": "$OPTIONAL_PAYLOAD_RESPONSE" },
      "usage": {
        "prompt_token": 89,
        "total_tokens": 145,
        "completion_token": 56,
        "cost": 0.0012
      },
      "meta": {
        "request_model": "gpt-35-turbo",
        "provider_name": "azure",
        "response_model": "gpt-35-turbo",
        "plugin_id": "5df193be-47a3-4f1b-8c37-37e31af0568b"
      },
      "cache": {
        "cache_status": "Hit",
        "fetch_latency": 444,
        "embeddings_provider": "openai",
        "embeddings_model": "text-embedding-3-small",
        "embeddings_latency": 424
      }
    }
  }

Note: When returning a cache response, time_per_token and llm_latency are omitted. The cache response can be returned either as a semantic cache or an exact cache. If it’s returned as a semantic cache, it will include additional details such as the embeddings provider, embeddings model, and embeddings latency.

AI Gateway audit log reference

Log details

Cache logging v3.8+

Help us make these docs great!

Still need help