AI Gateway audit log reference

Uses: AI Gateway Kong Gateway

Kong AI Gateway emits structured analytics logs for AI plugins through the standard Kong Gateway logging infrastructure. This means AI-specific logs are written to the same locations as other Kong logs, such as /usr/local/kong/logs/error.log, or to Docker container logs if you’re running in a containerized environment.

Like other Kong logs, AI Gateway logs are subject to the global log level configured via the kong.conf file or the Admin API. You can control log verbosity by adjusting the log_level setting (for example, info, notice, warn, error, crit) to determine which log entries are captured.

You can also use logging plugins to route these logs to external systems, such as file systems, log aggregators, or monitoring tools.

Log details

Each AI plugin returns a set of tokens. Log entries include the following details:

Property

Description

ai.$PLUGIN_NAME.payload.request The request payload.
ai.$PLUGIN_NAME.payload.response The response payload.
ai.$PLUGIN_NAME.usage.prompt_token The number of tokens used for prompting.
ai.$PLUGIN_NAME.usage.completion_token The number of tokens used for completion.
ai.$PLUGIN_NAME.usage.total_tokens The total number of tokens used.
ai.$PLUGIN_NAME.usage.cost The total cost of the request (input and output cost).
ai.$PLUGIN_NAME.usage.time_per_token v3.8+ The average time to generate an output token, in milliseconds.
ai.$PLUGIN_NAME.meta.request_model The model used for the AI request.
ai.$PLUGIN_NAME.meta.provider_name The name of the AI service provider.
ai.$PLUGIN_NAME.meta.response_model The model used for the AI response.
ai.$PLUGIN_NAME.meta.plugin_id The unique identifier of the plugin.
ai.$PLUGIN_NAME.meta.llm_latency v3.8+ The time, in milliseconds, it took the LLM provider to generate the full response.
ai.$PLUGIN_NAME.cache.cache_status v3.8+ The cache status. This can be Hit, Miss, Bypass or Refresh.
ai.$PLUGIN_NAME.cache.fetch_latency v3.8+ The time, in milliseconds, it took to return a cache response.
ai.$PLUGIN_NAME.cache.embeddings_provider v3.8+ For semantic caching, the provider used to generate the embeddings.
ai.$PLUGIN_NAME.cache.embeddings_model v3.8+ For semantic caching, the model used to generate the embeddings.
ai.$PLUGIN_NAME.cache.embeddings_latency v3.8+ For semantic caching, the time taken to generate the embeddings.

The following example shows a structured AI Gateway log entry:

"ai": {
    "payload": { "request": "$OPTIONAL_PAYLOAD_REQUEST" },
    "$PLUGIN_NAME_1": {
      "payload": { "response": "$OPTIONAL_PAYLOAD_RESPONSE" },
      "usage": {
        "prompt_token": 28,
        "total_tokens": 48,
        "completion_token": 20,
        "cost": 0.0038,
        "time_per_token": 133
      },
      "meta": {
        "request_model": "command",
        "provider_name": "cohere",
        "response_model": "command",
        "plugin_id": "546c3856-24b3-469a-bd6c-f6083babd2cd",
        "llm_latency": 2670
      }
    },
    "$PLUGIN_NAME_2": {
      "payload": { "response": "$OPTIONAL_PAYLOAD_RESPONSE" },
      "usage": {
        "prompt_token": 89,
        "total_tokens": 145,
        "completion_token": 56,
        "cost": 0.0012,
        "time_per_token": 87
      },
      "meta": {
        "request_model": "gpt-35-turbo",
        "provider_name": "azure",
        "response_model": "gpt-35-turbo",
        "plugin_id": "5df193be-47a3-4f1b-8c37-37e31af0568b",
        "llm_latency": 4927
      }
    }
  }

Cache logging v3.8+

If you’re using the AI Semantic Cache plugin, AI Gateway logs include additional fields under the cache object for each plugin entry. These fields provide insight into cache behavior—such as whether a response was served from cache, how long it took to fetch, and which embedding provider and model were used if applicable.

The following example shows how cache-related metadata appears alongside usage and model details in a structured AI log entry:

"ai": {
    "payload": { "request": "$OPTIONAL_PAYLOAD_REQUEST_" },
    "$PLUGIN_NAME_1": {
      "payload": { "response": "$OPTIONAL_PAYLOAD_RESPONSE" },
      "usage": {
        "prompt_token": 28,
        "total_tokens": 48,
        "completion_token": 20,
        "cost": 0.0038,
        "time_per_token": 133
      },
      "meta": {
        "request_model": "command",
        "provider_name": "cohere",
        "response_model": "command",
        "plugin_id": "546c3856-24b3-469a-bd6c-f6083babd2cd",
        "llm_latency": 2670
      },
      "cache": {
        "cache_status": "Hit",
        "fetch_latency": 21
      }
    },
    "$PLUGIN_NAME_2": {
      "payload": { "response": "$OPTIONAL_PAYLOAD_RESPONSE" },
      "usage": {
        "prompt_token": 89,
        "total_tokens": 145,
        "completion_token": 56,
        "cost": 0.0012
      },
      "meta": {
        "request_model": "gpt-35-turbo",
        "provider_name": "azure",
        "response_model": "gpt-35-turbo",
        "plugin_id": "5df193be-47a3-4f1b-8c37-37e31af0568b"
      },
      "cache": {
        "cache_status": "Hit",
        "fetch_latency": 444,
        "embeddings_provider": "openai",
        "embeddings_model": "text-embedding-3-small",
        "embeddings_latency": 424
      }
    }
  }

Note: When returning a cache response, time_per_token and llm_latency are omitted. The cache response can be returned either as a semantic cache or an exact cache. If it’s returned as a semantic cache, it will include additional details such as the embeddings provider, embeddings model, and embeddings latency.

Something wrong?

Help us make these docs great!

Kong Developer docs are open source. If you find these useful and want to make them better, contribute today!