AI Gateway audit log reference

Uses: AI Gateway Kong Gateway
Related Documentation
Minimum Version
Kong Gateway - 3.6
Tags

Kong AI Gateway emits structured analytics logs for AI plugins through the standard Kong Gateway logging infrastructure. This means AI-specific logs are written to the same locations as other Kong logs, such as /usr/local/kong/logs/error.log, or to Docker container logs if you’re running in a containerized environment.

Like other Kong logs, AI Gateway logs are subject to the global log level configured via the kong.conf file or the Admin API. You can control log verbosity by adjusting the log_level setting (for example, info, notice, warn, error, crit) to determine which log entries are captured.

You can also use logging plugins to route these logs to external systems, such as file systems, log aggregators, or monitoring tools.

Log details

Each AI plugin returns a set of tokens. Log entries include the following details:

AI Proxy core logs

The AI Proxy and AI Proxy Advanced plugins act as the main gateway for forwarding requests to AI providers. Logs here capture detailed information about the request and response payloads, token usage, model details, latency, and cost metrics. They provide a comprehensive view of each AI interaction.

Property

Description

ai.proxy.payload.request The request payload sent to the upstream AI provider.
ai.proxy.payload.response The response payload received from the upstream AI provider.
ai.proxy.usage.prompt_tokens The number of tokens used for prompting. Used for text-based requests (chat, completions, embeddings).
ai.proxy.usage.prompt_tokens_details v3.11+ A breakdown of prompt tokens (cached_tokens, audio_tokens).
ai.proxy.usage.completion_tokens The number of tokens used for completion. Used for text-based responses (chat, completions).
ai.proxy.usage.completion_tokens_details v3.11+ A breakdown of completion tokens (rejected_prediction_tokens, reasoning_tokens, accepted_prediction_tokens, audio_tokens).
ai.proxy.usage.total_tokens The total number of tokens used (input + output). Includes prompt/completion tokens for text, and input/output tokens for non-text modalities.
ai.proxy.usage.input_tokens v3.11+ The total number of input tokens (text + image + audio). Used for non-text requests (e.g., image or audio generation).
ai.proxy.usage.input_tokens_details v3.11+ A breakdown of input tokens by modality (text_tokens, image_tokens, audio_tokens_count).
ai.proxy.usage.output_tokens v3.11+ The total number of output tokens (text + audio). Used for non-text responses (e.g., image or audio generation).
ai.proxy.usage.output_tokens_details v3.11+ A breakdown of output tokens by modality (text_tokens, audio_tokens).
ai.proxy.usage.cost The total cost of the request.
ai.proxy.usage.time_per_token v3.8+ Average time to generate an output token (ms).
ai.proxy.usage.time_to_first_token v3.8+ Time to receive the first output token (ms).
ai.proxy.meta.request_model The model used for the AI request.
ai.proxy.meta.response_model The model used to generate the AI response.
ai.proxy.meta.provider_name The name of the AI service provider.
ai.proxy.meta.plugin_id Unique identifier of the plugin instance.
ai.proxy.meta.llm_latency v3.8+ Time taken by the LLM provider to generate the full response (ms).

AI AWS Guardrails logs v3.11+

For users using the AI AWS Guardrails plugin, logs capture processing times and configuration metadata related to content guardrails applied to inputs and outputs.

The following fields appear in structured AI logs when the AI AWS Guardrails plugin is enabled:

Property

Description

ai.proxy.aws-guardrails.guardrails_id The unique identifier of the guardrails configuration applied.
ai.proxy.aws-guardrails.output_processing_latency The time (in milliseconds) taken to process the output through guardrails.
ai.proxy.aws-guardrails.inputput_processing_latency The time (in milliseconds) taken to process the input through guardrails.
ai.proxy.aws-guardrails.guardrails_version The version or state of the guardrails configuration (for example, DRAFT, RELEASE).
ai.proxy.aws-guardrails.aws_region The AWS region where the guardrails are deployed or executed.

AI Azure Content Safety logs

If the AI Azure Content Safety plugin is enabled, each corresponding log entry records a detected feature level for a user-defined content safety category (for example, Hate, Violence, SexualContent). The category is a user-defined name, and the feature level indicates the detected severity for that category, as seen here. Multiple entries can appear per request depending on the configuration and detected content.

For detailed information on categories and severity levels, see Harm categories in Azure AI Content Safety - Azure AI services.

Property

Description

ai.audit.azure_content_safety.<CATEGORY> Detected feature level for a user-defined category (for example, Hate, Violence). There can be multiple entries per request depending on configuration and detected content.

AI PII Sanitizer logs v3.10+

If you’re using the AI PII Sanitizer plugin, AI Gateway logs include additional fields that provide insight into the detection and redaction of personally identifiable information (PII). These fields track the number of entities identified and sanitized, the time taken to process the payload, and detailed metadata about each sanitized item—including the original value, redacted value, and detected entity type.

The following fields appear in structured AI logs when the AI PII Sanitizer plugin is enabled:

Property

Description

ai.sanitizer.pii_identified The number of PII entities detected in the input payload.
ai.sanitizer.pii_sanitized The number of PII entities that were anonymized or redacted.
ai.sanitizer.duration The time taken (in milliseconds) by the ai-pii-service container to process the payload.
ai.sanitizer.sanitized_items A list of sanitized PII entities, each including the original text, redacted text, and the entity type.

AI Prompt Compressor logs v3.11+

When the AI Prompt Compressor plugin is enabled, additional logs record token counts before and after compression, compression ratios, and metadata about the compression method and model used.

The following fields appear in structured AI logs when the AI Prompt Compressor plugin is enabled:

Property

Description

ai.compressor.original_token_count The original number of tokens before compression.
ai.compressor.compress_token_count The number of tokens after compression.
ai.compressor.save_token_count The number of tokens saved by compression (original minus compressed).
ai.compressor.compress_value The compression ratio applied.
ai.compressor.compress_type The type or method of compression used.
ai.compressor.compressor_model The model used to perform the compression.
ai.compressor.msg_id The identifier of the message that was compressed.
ai.compressor.information A summary or message describing the result of compression.

AI RAG Injector logs v3.10+

If you’re using the AI RAG Injector plugin, AI Gateway logs include additional fields that provide detailed information about the retrieval-augmented generation process. These fields track the vector database used, whether relevant context was injected into the prompt, the latency of data fetching, and embedding metadata such as tokens used and the provider/model details.

The following fields appear in structured AI logs when the AI RAG Injector plugin is enabled:

Property

Description

ai.proxy.rag-inject.vector_db The vector database used (for example, pgvector).
ai.proxy.rag-inject.injected Boolean indicating if RAG injection occurred.
ai.proxy.rag-inject.fetch_latency The fetch latency in milliseconds.
ai.proxy.rag-inject.chunk_ids List of chunk IDs retrieved.
ai.proxy.rag-inject.embeddings_latency Time taken to generate embeddings, in milliseconds.
ai.proxy.rag-inject.embeddings_tokens Number of tokens used for embeddings.
ai.proxy.rag-inject.embeddings_provider Provider used to generate embeddings.
ai.proxy.rag-inject.embeddings_model Model used to generate embeddings.

AI Semantic Cache logs v3.8+

If you’re using the AI Semantic Cache plugin, AI Gateway logs include additional fields under the cache object for each plugin entry. These fields provide insight into cache behavior—such as whether a response was served from cache, how long it took to fetch, and which embedding provider and model were used if applicable.

The following fields appear in AI logs when semantic caching is enabled:

Property

Description

ai.proxy.cache.cache_status v3.8+ The cache status. This can be Hit, Miss, Bypass, or Refresh.
ai.proxy.cache.fetch_latency The time, in milliseconds, it took to return a cached response.
ai.proxy.cache.embeddings_provider The provider used to generate the embeddings.
ai.proxy.cache.embeddings_model The model used to generate the embeddings.
ai.proxy.cache.embeddings_latency The time taken to generate the embeddings.

Note: When returning a cached response, time_per_token and llm_latency are omitted. The cache response can be returned either as a semantic cache or an exact cache. If it’s returned as a semantic cache, it will include additional details such as the embeddings provider, embeddings model, and embeddings latency.

Example log entry

The following example shows a structured AI Gateway log entry:

{
  "ai": {
    "payload": {
      "request": "$OPTIONAL_PAYLOAD_REQUEST"
    },
    "proxy": {
      "payload": {
        "response": "$OPTIONAL_PAYLOAD_RESPONSE"
      },
      "usage": {
      "time_per_token": 30.142857142857,
      "time_to_first_token": 631,
      "completion_tokens": 21,
      "completion_tokens_details": {
        "rejected_prediction_tokens": 0,
        "reasoning_tokens": 0,
        "accepted_prediction_tokens": 0,
        "audio_tokens": 0
      },
      "prompt_tokens_details": {
        "cached_tokens": 0,
        "audio_tokens": 0
      },
      "prompt_tokens": 14,
      "total_tokens": 35,
      "cost": 0
    },
      "meta": {
        "request_model": "command",
        "provider_name": "cohere",
        "response_model": "command",
        "plugin_id": "546c3856-24b3-469a-bd6c-f6083babd2cd",
        "llm_latency": 2670
      },
      "cache": {
        "cache_status": "Hit",
        "fetch_latency": 12,
        "embeddings_provider": "openai",
        "embeddings_model": "text-embedding-ada-002",
        "embeddings_latency": 42
      },
      "aws-guardrails": {
        "guardrails_id": "gr-1234abcd",
        "guardrails_version": "RELEASE",
        "aws_region": "us-west-2",
        "inputput_processing_latency": 134,
        "output_processing_latency": 278
      },
      "rag-inject": {
        "vector_db": "pgvector",
        "injected": true,
        "fetch_latency": 154,
        "chunk_ids": ["chunk-1", "chunk-2"],
        "embeddings_latency": 37,
        "embeddings_tokens": 62,
        "embeddings_provider": "openai",
        "embeddings_model": "text-embedding-ada-002"
      }
    },
    "compressor": {
      "original_token_count": 845,
      "compress_token_count": 485,
      "save_token_count": 360,
      "compress_value": 0.5,
      "compress_type": "rate",
      "compressor_model": "microsoft/llmlingua-2-bert-base-multilingual-cased-meetingbank",
      "msg_id": 1,
      "information": "Compression was performed and saved 360 tokens"
    },
    "sanitizer": {
      "pii_identified": 3,
      "pii_sanitized": 3,
      "duration": 65,
      "sanitized_items": [
        {
          "entity_type": "EMAIL",
          "original": "jane.doe@example.com",
          "sanitized": "[REDACTED]"
        },
        {
          "entity_type": "PHONE_NUMBER",
          "original": "555-123-4567",
          "sanitized": "[REDACTED]"
        }
      ]
    },
    "audit": {
      "azure_content_safety": {
        "Hate": "High",
        "Violence": "Medium"
      }
    }
  }
}
Something wrong?

Help us make these docs great!

Kong Developer docs are open source. If you find these useful and want to make them better, contribute today!