Each AI plugin returns a set of tokens. Log entries include the following details:
The AI Proxy and AI Proxy Advanced plugins act as the main gateway for forwarding requests to AI providers. Logs here capture detailed information about the request and response payloads, token usage, model details, latency, and cost metrics. They provide a comprehensive view of each AI interaction.
Property
|
Description
|
ai.proxy.payload.request
|
The request payload sent to the upstream AI provider.
|
ai.proxy.payload.response
|
The response payload received from the upstream AI provider.
|
ai.proxy.usage.prompt_tokens
|
The number of tokens used for prompting.
Used for text-based requests (chat, completions, embeddings).
|
ai.proxy.usage.prompt_tokens_details
|
v3.11+ A breakdown of prompt tokens (cached_tokens , audio_tokens ).
|
ai.proxy.usage.completion_tokens
|
The number of tokens used for completion.
Used for text-based responses (chat, completions).
|
ai.proxy.usage.completion_tokens_details
|
v3.11+ A breakdown of completion tokens (rejected_prediction_tokens , reasoning_tokens , accepted_prediction_tokens , audio_tokens ).
|
ai.proxy.usage.total_tokens
|
The total number of tokens used (input + output).
Includes prompt/completion tokens for text, and input/output tokens for non-text modalities.
|
ai.proxy.usage.input_tokens
|
v3.11+ The total number of input tokens (text + image + audio).
Used for non-text requests (e.g., image or audio generation).
|
ai.proxy.usage.input_tokens_details
|
v3.11+ A breakdown of input tokens by modality (text_tokens , image_tokens , audio_tokens_count ).
|
ai.proxy.usage.output_tokens
|
v3.11+ The total number of output tokens (text + audio).
Used for non-text responses (e.g., image or audio generation).
|
ai.proxy.usage.output_tokens_details
|
v3.11+ A breakdown of output tokens by modality (text_tokens , audio_tokens ).
|
ai.proxy.usage.cost
|
The total cost of the request.
|
ai.proxy.usage.time_per_token
|
v3.8+ Average time to generate an output token (ms).
|
ai.proxy.usage.time_to_first_token
|
v3.8+ Time to receive the first output token (ms).
|
ai.proxy.meta.request_model
|
The model used for the AI request.
|
ai.proxy.meta.response_model
|
The model used to generate the AI response.
|
ai.proxy.meta.provider_name
|
The name of the AI service provider.
|
ai.proxy.meta.plugin_id
|
Unique identifier of the plugin instance.
|
ai.proxy.meta.llm_latency
|
v3.8+ Time taken by the LLM provider to generate the full response (ms).
|
For users using the AI AWS Guardrails plugin, logs capture processing times and configuration metadata related to content guardrails applied to inputs and outputs.
The following fields appear in structured AI logs when the AI AWS Guardrails plugin is enabled:
Property
|
Description
|
ai.proxy.aws-guardrails.guardrails_id
|
The unique identifier of the guardrails configuration applied.
|
ai.proxy.aws-guardrails.output_processing_latency
|
The time (in milliseconds) taken to process the output through guardrails.
|
ai.proxy.aws-guardrails.inputput_processing_latency
|
The time (in milliseconds) taken to process the input through guardrails.
|
ai.proxy.aws-guardrails.guardrails_version
|
The version or state of the guardrails configuration (for example, DRAFT , RELEASE ).
|
ai.proxy.aws-guardrails.aws_region
|
The AWS region where the guardrails are deployed or executed.
|
If the AI Azure Content Safety plugin is enabled, each corresponding log entry records a detected feature level for a user-defined content safety category (for example, Hate
, Violence
, SexualContent
). The category is a user-defined name, and the feature level indicates the detected severity for that category, as seen here. Multiple entries can appear per request depending on the configuration and detected content.
For detailed information on categories and severity levels, see Harm categories in Azure AI Content Safety - Azure AI services.
Property
|
Description
|
ai.audit.azure_content_safety.<CATEGORY>
|
Detected feature level for a user-defined category (for example, Hate , Violence ). There can be multiple entries per request depending on configuration and detected content.
|
If you’re using the AI PII Sanitizer plugin, AI Gateway logs include additional fields that provide insight into the detection and redaction of personally identifiable information (PII). These fields track the number of entities identified and sanitized, the time taken to process the payload, and detailed metadata about each sanitized item—including the original value, redacted value, and detected entity type.
The following fields appear in structured AI logs when the AI PII Sanitizer plugin is enabled:
Property
|
Description
|
ai.sanitizer.pii_identified
|
The number of PII entities detected in the input payload.
|
ai.sanitizer.pii_sanitized
|
The number of PII entities that were anonymized or redacted.
|
ai.sanitizer.duration
|
The time taken (in milliseconds) by the ai-pii-service container to process the payload.
|
ai.sanitizer.sanitized_items
|
A list of sanitized PII entities, each including the original text, redacted text, and the entity type.
|
When the AI Prompt Compressor plugin is enabled, additional logs record token counts before and after compression, compression ratios, and metadata about the compression method and model used.
The following fields appear in structured AI logs when the AI Prompt Compressor plugin is enabled:
Property
|
Description
|
ai.compressor.original_token_count
|
The original number of tokens before compression.
|
ai.compressor.compress_token_count
|
The number of tokens after compression.
|
ai.compressor.save_token_count
|
The number of tokens saved by compression (original minus compressed).
|
ai.compressor.compress_value
|
The compression ratio applied.
|
ai.compressor.compress_type
|
The type or method of compression used.
|
ai.compressor.compressor_model
|
The model used to perform the compression.
|
ai.compressor.msg_id
|
The identifier of the message that was compressed.
|
ai.compressor.information
|
A summary or message describing the result of compression.
|
If you’re using the AI RAG Injector plugin, AI Gateway logs include additional fields that provide detailed information about the retrieval-augmented generation process. These fields track the vector database used, whether relevant context was injected into the prompt, the latency of data fetching, and embedding metadata such as tokens used and the provider/model details.
The following fields appear in structured AI logs when the AI RAG Injector plugin is enabled:
Property
|
Description
|
ai.proxy.rag-inject.vector_db
|
The vector database used (for example, pgvector ).
|
ai.proxy.rag-inject.injected
|
Boolean indicating if RAG injection occurred.
|
ai.proxy.rag-inject.fetch_latency
|
The fetch latency in milliseconds.
|
ai.proxy.rag-inject.chunk_ids
|
List of chunk IDs retrieved.
|
ai.proxy.rag-inject.embeddings_latency
|
Time taken to generate embeddings, in milliseconds.
|
ai.proxy.rag-inject.embeddings_tokens
|
Number of tokens used for embeddings.
|
ai.proxy.rag-inject.embeddings_provider
|
Provider used to generate embeddings.
|
ai.proxy.rag-inject.embeddings_model
|
Model used to generate embeddings.
|
If you’re using the AI Semantic Cache plugin, AI Gateway logs include additional fields under the cache object for each plugin entry. These fields provide insight into cache behavior—such as whether a response was served from cache, how long it took to fetch, and which embedding provider and model were used if applicable.
The following fields appear in AI logs when semantic caching is enabled:
Property
|
Description
|
ai.proxy.cache.cache_status
|
v3.8+ The cache status. This can be Hit , Miss , Bypass , or Refresh .
|
ai.proxy.cache.fetch_latency
|
The time, in milliseconds, it took to return a cached response.
|
ai.proxy.cache.embeddings_provider
|
The provider used to generate the embeddings.
|
ai.proxy.cache.embeddings_model
|
The model used to generate the embeddings.
|
ai.proxy.cache.embeddings_latency
|
The time taken to generate the embeddings.
|
Note:
When returning a cached response, time_per_token
and llm_latency
are omitted.
The cache response can be returned either as a semantic cache or an exact cache. If it’s returned as a semantic cache, it will include additional details such as the embeddings provider, embeddings model, and embeddings latency.