Configure the AI LLM as Judge plugin
Evaluate responses by assigning a correctness score for AI-assisted learning and assessment.
Check this how-to guide to see how the plugin works in a real-life scenario.
Prerequisites
-
You have a working OpenAI API key
-
You have enabled the AI Proxy or AI Proxy Advanced plugin
Environment variables
OPENAI_API_KEY
Add this section to your kong.yaml configuration file:
_format_version: "3.0"
plugins:
- name: ai-llm-as-judge
config:
prompt: |
You are a strict evaluator. You will be given a request and a response.
Your task is to judge whether the response is correct or incorrect. You must
assign a score between 1 and 100, where: 100 represents a completely correct
and ideal response, 1 represents a completely incorrect or irrelevant response.
Your score must be a single number only — no text, labels, or explanations.
Use the full range of values (e.g., 13, 47, 86), not just round numbers like
10, 50, or 100. Be accurate and consistent, as this score will be used by another
model for learning and evaluation.
http_timeout: 60000
https_verify: true
ignore_assistant_prompts: true
ignore_system_prompts: true
ignore_tool_prompts: true
sampling_rate: 1
llm:
auth:
allow_override: false
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
logging:
log_payloads: true
log_statistics: true
model:
name: gpt-4o
provider: openai
options:
temperature: 2
max_tokens: 5
top_p: 1
cohere:
embedding_input_type: classification
route_type: llm/v1/chat
message_countback: 3
Make the following request:
curl -i -X POST http://localhost:8001/plugins/ \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--data '
{
"name": "ai-llm-as-judge",
"config": {
"prompt": "You are a strict evaluator. You will be given a request and a response.\nYour task is to judge whether the response is correct or incorrect. You must\nassign a score between 1 and 100, where: 100 represents a completely correct\nand ideal response, 1 represents a completely incorrect or irrelevant response.\nYour score must be a single number only — no text, labels, or explanations.\nUse the full range of values (e.g., 13, 47, 86), not just round numbers like\n10, 50, or 100. Be accurate and consistent, as this score will be used by another\nmodel for learning and evaluation.\n",
"http_timeout": 60000,
"https_verify": true,
"ignore_assistant_prompts": true,
"ignore_system_prompts": true,
"ignore_tool_prompts": true,
"sampling_rate": 1,
"llm": {
"auth": {
"allow_override": false,
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_payloads": true,
"log_statistics": true
},
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"temperature": 2,
"max_tokens": 5,
"top_p": 1,
"cohere": {
"embedding_input_type": "classification"
}
}
},
"route_type": "llm/v1/chat"
},
"message_countback": 3
},
"tags": []
}
'
Make the following request:
curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/plugins/ \
--header "accept: application/json" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer $KONNECT_TOKEN" \
--data '
{
"name": "ai-llm-as-judge",
"config": {
"prompt": "You are a strict evaluator. You will be given a request and a response.\nYour task is to judge whether the response is correct or incorrect. You must\nassign a score between 1 and 100, where: 100 represents a completely correct\nand ideal response, 1 represents a completely incorrect or irrelevant response.\nYour score must be a single number only — no text, labels, or explanations.\nUse the full range of values (e.g., 13, 47, 86), not just round numbers like\n10, 50, or 100. Be accurate and consistent, as this score will be used by another\nmodel for learning and evaluation.\n",
"http_timeout": 60000,
"https_verify": true,
"ignore_assistant_prompts": true,
"ignore_system_prompts": true,
"ignore_tool_prompts": true,
"sampling_rate": 1,
"llm": {
"auth": {
"allow_override": false,
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_payloads": true,
"log_statistics": true
},
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"temperature": 2,
"max_tokens": 5,
"top_p": 1,
"cohere": {
"embedding_input_type": "classification"
}
}
},
"route_type": "llm/v1/chat"
},
"message_countback": 3
},
"tags": []
}
'
Make sure to replace the following placeholders with your own values:
-
region: Geographic region where your Kong Konnect is hosted and operates. -
KONNECT_TOKEN: Your Personal Access Token (PAT) associated with your Konnect account. -
controlPlaneId: Theidof the control plane.
See the Konnect API reference to learn about region-specific URLs and personal access tokens.
echo "
apiVersion: configuration.konghq.com/v1
kind: KongClusterPlugin
metadata:
name: ai-llm-as-judge
namespace: kong
annotations:
kubernetes.io/ingress.class: kong
konghq.com/tags: ''
labels:
global: 'true'
config:
prompt: |
You are a strict evaluator. You will be given a request and a response.
Your task is to judge whether the response is correct or incorrect. You must
assign a score between 1 and 100, where: 100 represents a completely correct
and ideal response, 1 represents a completely incorrect or irrelevant response.
Your score must be a single number only — no text, labels, or explanations.
Use the full range of values (e.g., 13, 47, 86), not just round numbers like
10, 50, or 100. Be accurate and consistent, as this score will be used by another
model for learning and evaluation.
http_timeout: 60000
https_verify: true
ignore_assistant_prompts: true
ignore_system_prompts: true
ignore_tool_prompts: true
sampling_rate: 1
llm:
auth:
allow_override: false
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
logging:
log_payloads: true
log_statistics: true
model:
name: gpt-4o
provider: openai
options:
temperature: 2
max_tokens: 5
top_p: 1
cohere:
embedding_input_type: classification
route_type: llm/v1/chat
message_countback: 3
plugin: ai-llm-as-judge
" | kubectl apply -f -
Prerequisite: Configure your Personal Access Token
terraform {
required_providers {
konnect = {
source = "kong/konnect"
}
}
}
provider "konnect" {
personal_access_token = "$KONNECT_TOKEN"
server_url = "https://us.api.konghq.com/"
}
Add the following to your Terraform configuration to create a Konnect Gateway Plugin:
resource "konnect_gateway_plugin_ai_llm_as_judge" "my_ai_llm_as_judge" {
enabled = true
config = {
prompt = <<EOF
You are a strict evaluator. You will be given a request and a response.
Your task is to judge whether the response is correct or incorrect. You must
assign a score between 1 and 100, where: 100 represents a completely correct
and ideal response, 1 represents a completely incorrect or irrelevant response.
Your score must be a single number only — no text, labels, or explanations.
Use the full range of values (e.g., 13, 47, 86), not just round numbers like
10, 50, or 100. Be accurate and consistent, as this score will be used by another
model for learning and evaluation.
EOF
http_timeout = 60000
https_verify = true
ignore_assistant_prompts = true
ignore_system_prompts = true
ignore_tool_prompts = true
sampling_rate = 1
llm = {
auth = {
allow_override = false
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
logging = {
log_payloads = true
log_statistics = true
}
model = {
name = "gpt-4o"
provider = "openai"
options = {
temperature = 2
max_tokens = 5
top_p = 1
cohere = {
embedding_input_type = "classification"
}
}
}
route_type = "llm/v1/chat"
}
message_countback = 3
}
tags = []
control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
}
This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value.
variable "openai_api_key" {
type = string
}
Add this section to your kong.yaml configuration file:
_format_version: "3.0"
plugins:
- name: ai-llm-as-judge
service: serviceName|Id
config:
prompt: |
You are a strict evaluator. You will be given a request and a response.
Your task is to judge whether the response is correct or incorrect. You must
assign a score between 1 and 100, where: 100 represents a completely correct
and ideal response, 1 represents a completely incorrect or irrelevant response.
Your score must be a single number only — no text, labels, or explanations.
Use the full range of values (e.g., 13, 47, 86), not just round numbers like
10, 50, or 100. Be accurate and consistent, as this score will be used by another
model for learning and evaluation.
http_timeout: 60000
https_verify: true
ignore_assistant_prompts: true
ignore_system_prompts: true
ignore_tool_prompts: true
sampling_rate: 1
llm:
auth:
allow_override: false
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
logging:
log_payloads: true
log_statistics: true
model:
name: gpt-4o
provider: openai
options:
temperature: 2
max_tokens: 5
top_p: 1
cohere:
embedding_input_type: classification
route_type: llm/v1/chat
message_countback: 3
Make sure to replace the following placeholders with your own values:
-
serviceName|Id: Theidornameof the service the plugin configuration will target.
Make the following request:
curl -i -X POST http://localhost:8001/services/{serviceName|Id}/plugins/ \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--data '
{
"name": "ai-llm-as-judge",
"config": {
"prompt": "You are a strict evaluator. You will be given a request and a response.\nYour task is to judge whether the response is correct or incorrect. You must\nassign a score between 1 and 100, where: 100 represents a completely correct\nand ideal response, 1 represents a completely incorrect or irrelevant response.\nYour score must be a single number only — no text, labels, or explanations.\nUse the full range of values (e.g., 13, 47, 86), not just round numbers like\n10, 50, or 100. Be accurate and consistent, as this score will be used by another\nmodel for learning and evaluation.\n",
"http_timeout": 60000,
"https_verify": true,
"ignore_assistant_prompts": true,
"ignore_system_prompts": true,
"ignore_tool_prompts": true,
"sampling_rate": 1,
"llm": {
"auth": {
"allow_override": false,
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_payloads": true,
"log_statistics": true
},
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"temperature": 2,
"max_tokens": 5,
"top_p": 1,
"cohere": {
"embedding_input_type": "classification"
}
}
},
"route_type": "llm/v1/chat"
},
"message_countback": 3
},
"tags": []
}
'
Make sure to replace the following placeholders with your own values:
-
serviceName|Id: Theidornameof the service the plugin configuration will target.
Make the following request:
curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/services/{serviceId}/plugins/ \
--header "accept: application/json" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer $KONNECT_TOKEN" \
--data '
{
"name": "ai-llm-as-judge",
"config": {
"prompt": "You are a strict evaluator. You will be given a request and a response.\nYour task is to judge whether the response is correct or incorrect. You must\nassign a score between 1 and 100, where: 100 represents a completely correct\nand ideal response, 1 represents a completely incorrect or irrelevant response.\nYour score must be a single number only — no text, labels, or explanations.\nUse the full range of values (e.g., 13, 47, 86), not just round numbers like\n10, 50, or 100. Be accurate and consistent, as this score will be used by another\nmodel for learning and evaluation.\n",
"http_timeout": 60000,
"https_verify": true,
"ignore_assistant_prompts": true,
"ignore_system_prompts": true,
"ignore_tool_prompts": true,
"sampling_rate": 1,
"llm": {
"auth": {
"allow_override": false,
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_payloads": true,
"log_statistics": true
},
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"temperature": 2,
"max_tokens": 5,
"top_p": 1,
"cohere": {
"embedding_input_type": "classification"
}
}
},
"route_type": "llm/v1/chat"
},
"message_countback": 3
},
"tags": []
}
'
Make sure to replace the following placeholders with your own values:
-
region: Geographic region where your Kong Konnect is hosted and operates. -
KONNECT_TOKEN: Your Personal Access Token (PAT) associated with your Konnect account. -
controlPlaneId: Theidof the control plane. -
serviceId: Theidof the service the plugin configuration will target.
See the Konnect API reference to learn about region-specific URLs and personal access tokens.
echo "
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
name: ai-llm-as-judge
namespace: kong
annotations:
kubernetes.io/ingress.class: kong
konghq.com/tags: ''
config:
prompt: |
You are a strict evaluator. You will be given a request and a response.
Your task is to judge whether the response is correct or incorrect. You must
assign a score between 1 and 100, where: 100 represents a completely correct
and ideal response, 1 represents a completely incorrect or irrelevant response.
Your score must be a single number only — no text, labels, or explanations.
Use the full range of values (e.g., 13, 47, 86), not just round numbers like
10, 50, or 100. Be accurate and consistent, as this score will be used by another
model for learning and evaluation.
http_timeout: 60000
https_verify: true
ignore_assistant_prompts: true
ignore_system_prompts: true
ignore_tool_prompts: true
sampling_rate: 1
llm:
auth:
allow_override: false
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
logging:
log_payloads: true
log_statistics: true
model:
name: gpt-4o
provider: openai
options:
temperature: 2
max_tokens: 5
top_p: 1
cohere:
embedding_input_type: classification
route_type: llm/v1/chat
message_countback: 3
plugin: ai-llm-as-judge
" | kubectl apply -f -
Next, apply the KongPlugin resource by annotating the service resource:
kubectl annotate -n kong service SERVICE_NAME konghq.com/plugins=ai-llm-as-judge
Prerequisite: Configure your Personal Access Token
terraform {
required_providers {
konnect = {
source = "kong/konnect"
}
}
}
provider "konnect" {
personal_access_token = "$KONNECT_TOKEN"
server_url = "https://us.api.konghq.com/"
}
Add the following to your Terraform configuration to create a Konnect Gateway Plugin:
resource "konnect_gateway_plugin_ai_llm_as_judge" "my_ai_llm_as_judge" {
enabled = true
config = {
prompt = <<EOF
You are a strict evaluator. You will be given a request and a response.
Your task is to judge whether the response is correct or incorrect. You must
assign a score between 1 and 100, where: 100 represents a completely correct
and ideal response, 1 represents a completely incorrect or irrelevant response.
Your score must be a single number only — no text, labels, or explanations.
Use the full range of values (e.g., 13, 47, 86), not just round numbers like
10, 50, or 100. Be accurate and consistent, as this score will be used by another
model for learning and evaluation.
EOF
http_timeout = 60000
https_verify = true
ignore_assistant_prompts = true
ignore_system_prompts = true
ignore_tool_prompts = true
sampling_rate = 1
llm = {
auth = {
allow_override = false
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
logging = {
log_payloads = true
log_statistics = true
}
model = {
name = "gpt-4o"
provider = "openai"
options = {
temperature = 2
max_tokens = 5
top_p = 1
cohere = {
embedding_input_type = "classification"
}
}
}
route_type = "llm/v1/chat"
}
message_countback = 3
}
tags = []
control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
service = {
id = konnect_gateway_service.my_service.id
}
}
This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value.
variable "openai_api_key" {
type = string
}
Add this section to your kong.yaml configuration file:
_format_version: "3.0"
plugins:
- name: ai-llm-as-judge
route: routeName|Id
config:
prompt: |
You are a strict evaluator. You will be given a request and a response.
Your task is to judge whether the response is correct or incorrect. You must
assign a score between 1 and 100, where: 100 represents a completely correct
and ideal response, 1 represents a completely incorrect or irrelevant response.
Your score must be a single number only — no text, labels, or explanations.
Use the full range of values (e.g., 13, 47, 86), not just round numbers like
10, 50, or 100. Be accurate and consistent, as this score will be used by another
model for learning and evaluation.
http_timeout: 60000
https_verify: true
ignore_assistant_prompts: true
ignore_system_prompts: true
ignore_tool_prompts: true
sampling_rate: 1
llm:
auth:
allow_override: false
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
logging:
log_payloads: true
log_statistics: true
model:
name: gpt-4o
provider: openai
options:
temperature: 2
max_tokens: 5
top_p: 1
cohere:
embedding_input_type: classification
route_type: llm/v1/chat
message_countback: 3
Make sure to replace the following placeholders with your own values:
-
routeName|Id: Theidornameof the route the plugin configuration will target.
Make the following request:
curl -i -X POST http://localhost:8001/routes/{routeName|Id}/plugins/ \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--data '
{
"name": "ai-llm-as-judge",
"config": {
"prompt": "You are a strict evaluator. You will be given a request and a response.\nYour task is to judge whether the response is correct or incorrect. You must\nassign a score between 1 and 100, where: 100 represents a completely correct\nand ideal response, 1 represents a completely incorrect or irrelevant response.\nYour score must be a single number only — no text, labels, or explanations.\nUse the full range of values (e.g., 13, 47, 86), not just round numbers like\n10, 50, or 100. Be accurate and consistent, as this score will be used by another\nmodel for learning and evaluation.\n",
"http_timeout": 60000,
"https_verify": true,
"ignore_assistant_prompts": true,
"ignore_system_prompts": true,
"ignore_tool_prompts": true,
"sampling_rate": 1,
"llm": {
"auth": {
"allow_override": false,
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_payloads": true,
"log_statistics": true
},
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"temperature": 2,
"max_tokens": 5,
"top_p": 1,
"cohere": {
"embedding_input_type": "classification"
}
}
},
"route_type": "llm/v1/chat"
},
"message_countback": 3
},
"tags": []
}
'
Make sure to replace the following placeholders with your own values:
-
routeName|Id: Theidornameof the route the plugin configuration will target.
Make the following request:
curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/routes/{routeId}/plugins/ \
--header "accept: application/json" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer $KONNECT_TOKEN" \
--data '
{
"name": "ai-llm-as-judge",
"config": {
"prompt": "You are a strict evaluator. You will be given a request and a response.\nYour task is to judge whether the response is correct or incorrect. You must\nassign a score between 1 and 100, where: 100 represents a completely correct\nand ideal response, 1 represents a completely incorrect or irrelevant response.\nYour score must be a single number only — no text, labels, or explanations.\nUse the full range of values (e.g., 13, 47, 86), not just round numbers like\n10, 50, or 100. Be accurate and consistent, as this score will be used by another\nmodel for learning and evaluation.\n",
"http_timeout": 60000,
"https_verify": true,
"ignore_assistant_prompts": true,
"ignore_system_prompts": true,
"ignore_tool_prompts": true,
"sampling_rate": 1,
"llm": {
"auth": {
"allow_override": false,
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_payloads": true,
"log_statistics": true
},
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"temperature": 2,
"max_tokens": 5,
"top_p": 1,
"cohere": {
"embedding_input_type": "classification"
}
}
},
"route_type": "llm/v1/chat"
},
"message_countback": 3
},
"tags": []
}
'
Make sure to replace the following placeholders with your own values:
-
region: Geographic region where your Kong Konnect is hosted and operates. -
KONNECT_TOKEN: Your Personal Access Token (PAT) associated with your Konnect account. -
controlPlaneId: Theidof the control plane. -
routeId: Theidof the route the plugin configuration will target.
See the Konnect API reference to learn about region-specific URLs and personal access tokens.
echo "
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
name: ai-llm-as-judge
namespace: kong
annotations:
kubernetes.io/ingress.class: kong
konghq.com/tags: ''
config:
prompt: |
You are a strict evaluator. You will be given a request and a response.
Your task is to judge whether the response is correct or incorrect. You must
assign a score between 1 and 100, where: 100 represents a completely correct
and ideal response, 1 represents a completely incorrect or irrelevant response.
Your score must be a single number only — no text, labels, or explanations.
Use the full range of values (e.g., 13, 47, 86), not just round numbers like
10, 50, or 100. Be accurate and consistent, as this score will be used by another
model for learning and evaluation.
http_timeout: 60000
https_verify: true
ignore_assistant_prompts: true
ignore_system_prompts: true
ignore_tool_prompts: true
sampling_rate: 1
llm:
auth:
allow_override: false
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
logging:
log_payloads: true
log_statistics: true
model:
name: gpt-4o
provider: openai
options:
temperature: 2
max_tokens: 5
top_p: 1
cohere:
embedding_input_type: classification
route_type: llm/v1/chat
message_countback: 3
plugin: ai-llm-as-judge
" | kubectl apply -f -
Next, apply the KongPlugin resource by annotating the httproute or ingress resource:
kubectl annotate -n kong httproute konghq.com/plugins=ai-llm-as-judge
kubectl annotate -n kong ingress konghq.com/plugins=ai-llm-as-judge
Prerequisite: Configure your Personal Access Token
terraform {
required_providers {
konnect = {
source = "kong/konnect"
}
}
}
provider "konnect" {
personal_access_token = "$KONNECT_TOKEN"
server_url = "https://us.api.konghq.com/"
}
Add the following to your Terraform configuration to create a Konnect Gateway Plugin:
resource "konnect_gateway_plugin_ai_llm_as_judge" "my_ai_llm_as_judge" {
enabled = true
config = {
prompt = <<EOF
You are a strict evaluator. You will be given a request and a response.
Your task is to judge whether the response is correct or incorrect. You must
assign a score between 1 and 100, where: 100 represents a completely correct
and ideal response, 1 represents a completely incorrect or irrelevant response.
Your score must be a single number only — no text, labels, or explanations.
Use the full range of values (e.g., 13, 47, 86), not just round numbers like
10, 50, or 100. Be accurate and consistent, as this score will be used by another
model for learning and evaluation.
EOF
http_timeout = 60000
https_verify = true
ignore_assistant_prompts = true
ignore_system_prompts = true
ignore_tool_prompts = true
sampling_rate = 1
llm = {
auth = {
allow_override = false
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
logging = {
log_payloads = true
log_statistics = true
}
model = {
name = "gpt-4o"
provider = "openai"
options = {
temperature = 2
max_tokens = 5
top_p = 1
cohere = {
embedding_input_type = "classification"
}
}
}
route_type = "llm/v1/chat"
}
message_countback = 3
}
tags = []
control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
route = {
id = konnect_gateway_route.my_route.id
}
}
This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value.
variable "openai_api_key" {
type = string
}
Add this section to your kong.yaml configuration file:
_format_version: "3.0"
plugins:
- name: ai-llm-as-judge
consumer: consumerName|Id
config:
prompt: |
You are a strict evaluator. You will be given a request and a response.
Your task is to judge whether the response is correct or incorrect. You must
assign a score between 1 and 100, where: 100 represents a completely correct
and ideal response, 1 represents a completely incorrect or irrelevant response.
Your score must be a single number only — no text, labels, or explanations.
Use the full range of values (e.g., 13, 47, 86), not just round numbers like
10, 50, or 100. Be accurate and consistent, as this score will be used by another
model for learning and evaluation.
http_timeout: 60000
https_verify: true
ignore_assistant_prompts: true
ignore_system_prompts: true
ignore_tool_prompts: true
sampling_rate: 1
llm:
auth:
allow_override: false
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
logging:
log_payloads: true
log_statistics: true
model:
name: gpt-4o
provider: openai
options:
temperature: 2
max_tokens: 5
top_p: 1
cohere:
embedding_input_type: classification
route_type: llm/v1/chat
message_countback: 3
Make sure to replace the following placeholders with your own values:
-
consumerName|Id: Theidornameof the consumer the plugin configuration will target.
Make the following request:
curl -i -X POST http://localhost:8001/consumers/{consumerName|Id}/plugins/ \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--data '
{
"name": "ai-llm-as-judge",
"config": {
"prompt": "You are a strict evaluator. You will be given a request and a response.\nYour task is to judge whether the response is correct or incorrect. You must\nassign a score between 1 and 100, where: 100 represents a completely correct\nand ideal response, 1 represents a completely incorrect or irrelevant response.\nYour score must be a single number only — no text, labels, or explanations.\nUse the full range of values (e.g., 13, 47, 86), not just round numbers like\n10, 50, or 100. Be accurate and consistent, as this score will be used by another\nmodel for learning and evaluation.\n",
"http_timeout": 60000,
"https_verify": true,
"ignore_assistant_prompts": true,
"ignore_system_prompts": true,
"ignore_tool_prompts": true,
"sampling_rate": 1,
"llm": {
"auth": {
"allow_override": false,
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_payloads": true,
"log_statistics": true
},
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"temperature": 2,
"max_tokens": 5,
"top_p": 1,
"cohere": {
"embedding_input_type": "classification"
}
}
},
"route_type": "llm/v1/chat"
},
"message_countback": 3
},
"tags": []
}
'
Make sure to replace the following placeholders with your own values:
-
consumerName|Id: Theidornameof the consumer the plugin configuration will target.
Make the following request:
curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/consumers/{consumerId}/plugins/ \
--header "accept: application/json" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer $KONNECT_TOKEN" \
--data '
{
"name": "ai-llm-as-judge",
"config": {
"prompt": "You are a strict evaluator. You will be given a request and a response.\nYour task is to judge whether the response is correct or incorrect. You must\nassign a score between 1 and 100, where: 100 represents a completely correct\nand ideal response, 1 represents a completely incorrect or irrelevant response.\nYour score must be a single number only — no text, labels, or explanations.\nUse the full range of values (e.g., 13, 47, 86), not just round numbers like\n10, 50, or 100. Be accurate and consistent, as this score will be used by another\nmodel for learning and evaluation.\n",
"http_timeout": 60000,
"https_verify": true,
"ignore_assistant_prompts": true,
"ignore_system_prompts": true,
"ignore_tool_prompts": true,
"sampling_rate": 1,
"llm": {
"auth": {
"allow_override": false,
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_payloads": true,
"log_statistics": true
},
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"temperature": 2,
"max_tokens": 5,
"top_p": 1,
"cohere": {
"embedding_input_type": "classification"
}
}
},
"route_type": "llm/v1/chat"
},
"message_countback": 3
},
"tags": []
}
'
Make sure to replace the following placeholders with your own values:
-
region: Geographic region where your Kong Konnect is hosted and operates. -
KONNECT_TOKEN: Your Personal Access Token (PAT) associated with your Konnect account. -
controlPlaneId: Theidof the control plane. -
consumerId: Theidof the consumer the plugin configuration will target.
See the Konnect API reference to learn about region-specific URLs and personal access tokens.
echo "
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
name: ai-llm-as-judge
namespace: kong
annotations:
kubernetes.io/ingress.class: kong
konghq.com/tags: ''
config:
prompt: |
You are a strict evaluator. You will be given a request and a response.
Your task is to judge whether the response is correct or incorrect. You must
assign a score between 1 and 100, where: 100 represents a completely correct
and ideal response, 1 represents a completely incorrect or irrelevant response.
Your score must be a single number only — no text, labels, or explanations.
Use the full range of values (e.g., 13, 47, 86), not just round numbers like
10, 50, or 100. Be accurate and consistent, as this score will be used by another
model for learning and evaluation.
http_timeout: 60000
https_verify: true
ignore_assistant_prompts: true
ignore_system_prompts: true
ignore_tool_prompts: true
sampling_rate: 1
llm:
auth:
allow_override: false
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
logging:
log_payloads: true
log_statistics: true
model:
name: gpt-4o
provider: openai
options:
temperature: 2
max_tokens: 5
top_p: 1
cohere:
embedding_input_type: classification
route_type: llm/v1/chat
message_countback: 3
plugin: ai-llm-as-judge
" | kubectl apply -f -
Next, apply the KongPlugin resource by annotating the KongConsumer resource:
kubectl annotate -n kong CONSUMER_NAME konghq.com/plugins=ai-llm-as-judge
Prerequisite: Configure your Personal Access Token
terraform {
required_providers {
konnect = {
source = "kong/konnect"
}
}
}
provider "konnect" {
personal_access_token = "$KONNECT_TOKEN"
server_url = "https://us.api.konghq.com/"
}
Add the following to your Terraform configuration to create a Konnect Gateway Plugin:
resource "konnect_gateway_plugin_ai_llm_as_judge" "my_ai_llm_as_judge" {
enabled = true
config = {
prompt = <<EOF
You are a strict evaluator. You will be given a request and a response.
Your task is to judge whether the response is correct or incorrect. You must
assign a score between 1 and 100, where: 100 represents a completely correct
and ideal response, 1 represents a completely incorrect or irrelevant response.
Your score must be a single number only — no text, labels, or explanations.
Use the full range of values (e.g., 13, 47, 86), not just round numbers like
10, 50, or 100. Be accurate and consistent, as this score will be used by another
model for learning and evaluation.
EOF
http_timeout = 60000
https_verify = true
ignore_assistant_prompts = true
ignore_system_prompts = true
ignore_tool_prompts = true
sampling_rate = 1
llm = {
auth = {
allow_override = false
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
logging = {
log_payloads = true
log_statistics = true
}
model = {
name = "gpt-4o"
provider = "openai"
options = {
temperature = 2
max_tokens = 5
top_p = 1
cohere = {
embedding_input_type = "classification"
}
}
}
route_type = "llm/v1/chat"
}
message_countback = 3
}
tags = []
control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
consumer = {
id = konnect_gateway_consumer.my_consumer.id
}
}
This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value.
variable "openai_api_key" {
type = string
}
Add this section to your kong.yaml configuration file:
_format_version: "3.0"
plugins:
- name: ai-llm-as-judge
consumer_group: consumerGroupName|Id
config:
prompt: |
You are a strict evaluator. You will be given a request and a response.
Your task is to judge whether the response is correct or incorrect. You must
assign a score between 1 and 100, where: 100 represents a completely correct
and ideal response, 1 represents a completely incorrect or irrelevant response.
Your score must be a single number only — no text, labels, or explanations.
Use the full range of values (e.g., 13, 47, 86), not just round numbers like
10, 50, or 100. Be accurate and consistent, as this score will be used by another
model for learning and evaluation.
http_timeout: 60000
https_verify: true
ignore_assistant_prompts: true
ignore_system_prompts: true
ignore_tool_prompts: true
sampling_rate: 1
llm:
auth:
allow_override: false
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
logging:
log_payloads: true
log_statistics: true
model:
name: gpt-4o
provider: openai
options:
temperature: 2
max_tokens: 5
top_p: 1
cohere:
embedding_input_type: classification
route_type: llm/v1/chat
message_countback: 3
Make sure to replace the following placeholders with your own values:
-
consumerGroupName|Id: Theidornameof the consumer group the plugin configuration will target.
Make the following request:
curl -i -X POST http://localhost:8001/consumer_groups/{consumerGroupName|Id}/plugins/ \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--data '
{
"name": "ai-llm-as-judge",
"config": {
"prompt": "You are a strict evaluator. You will be given a request and a response.\nYour task is to judge whether the response is correct or incorrect. You must\nassign a score between 1 and 100, where: 100 represents a completely correct\nand ideal response, 1 represents a completely incorrect or irrelevant response.\nYour score must be a single number only — no text, labels, or explanations.\nUse the full range of values (e.g., 13, 47, 86), not just round numbers like\n10, 50, or 100. Be accurate and consistent, as this score will be used by another\nmodel for learning and evaluation.\n",
"http_timeout": 60000,
"https_verify": true,
"ignore_assistant_prompts": true,
"ignore_system_prompts": true,
"ignore_tool_prompts": true,
"sampling_rate": 1,
"llm": {
"auth": {
"allow_override": false,
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_payloads": true,
"log_statistics": true
},
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"temperature": 2,
"max_tokens": 5,
"top_p": 1,
"cohere": {
"embedding_input_type": "classification"
}
}
},
"route_type": "llm/v1/chat"
},
"message_countback": 3
},
"tags": []
}
'
Make sure to replace the following placeholders with your own values:
-
consumerGroupName|Id: Theidornameof the consumer group the plugin configuration will target.
Make the following request:
curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/consumer_groups/{consumerGroupId}/plugins/ \
--header "accept: application/json" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer $KONNECT_TOKEN" \
--data '
{
"name": "ai-llm-as-judge",
"config": {
"prompt": "You are a strict evaluator. You will be given a request and a response.\nYour task is to judge whether the response is correct or incorrect. You must\nassign a score between 1 and 100, where: 100 represents a completely correct\nand ideal response, 1 represents a completely incorrect or irrelevant response.\nYour score must be a single number only — no text, labels, or explanations.\nUse the full range of values (e.g., 13, 47, 86), not just round numbers like\n10, 50, or 100. Be accurate and consistent, as this score will be used by another\nmodel for learning and evaluation.\n",
"http_timeout": 60000,
"https_verify": true,
"ignore_assistant_prompts": true,
"ignore_system_prompts": true,
"ignore_tool_prompts": true,
"sampling_rate": 1,
"llm": {
"auth": {
"allow_override": false,
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_payloads": true,
"log_statistics": true
},
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"temperature": 2,
"max_tokens": 5,
"top_p": 1,
"cohere": {
"embedding_input_type": "classification"
}
}
},
"route_type": "llm/v1/chat"
},
"message_countback": 3
},
"tags": []
}
'
Make sure to replace the following placeholders with your own values:
-
region: Geographic region where your Kong Konnect is hosted and operates. -
KONNECT_TOKEN: Your Personal Access Token (PAT) associated with your Konnect account. -
controlPlaneId: Theidof the control plane. -
consumerGroupId: Theidof the consumer group the plugin configuration will target.
See the Konnect API reference to learn about region-specific URLs and personal access tokens.
echo "
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
name: ai-llm-as-judge
namespace: kong
annotations:
kubernetes.io/ingress.class: kong
konghq.com/tags: ''
config:
prompt: |
You are a strict evaluator. You will be given a request and a response.
Your task is to judge whether the response is correct or incorrect. You must
assign a score between 1 and 100, where: 100 represents a completely correct
and ideal response, 1 represents a completely incorrect or irrelevant response.
Your score must be a single number only — no text, labels, or explanations.
Use the full range of values (e.g., 13, 47, 86), not just round numbers like
10, 50, or 100. Be accurate and consistent, as this score will be used by another
model for learning and evaluation.
http_timeout: 60000
https_verify: true
ignore_assistant_prompts: true
ignore_system_prompts: true
ignore_tool_prompts: true
sampling_rate: 1
llm:
auth:
allow_override: false
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
logging:
log_payloads: true
log_statistics: true
model:
name: gpt-4o
provider: openai
options:
temperature: 2
max_tokens: 5
top_p: 1
cohere:
embedding_input_type: classification
route_type: llm/v1/chat
message_countback: 3
plugin: ai-llm-as-judge
" | kubectl apply -f -
Next, apply the KongPlugin resource by annotating the KongConsumerGroup resource:
kubectl annotate -n kong CONSUMERGROUP_NAME konghq.com/plugins=ai-llm-as-judge
Prerequisite: Configure your Personal Access Token
terraform {
required_providers {
konnect = {
source = "kong/konnect"
}
}
}
provider "konnect" {
personal_access_token = "$KONNECT_TOKEN"
server_url = "https://us.api.konghq.com/"
}
Add the following to your Terraform configuration to create a Konnect Gateway Plugin:
resource "konnect_gateway_plugin_ai_llm_as_judge" "my_ai_llm_as_judge" {
enabled = true
config = {
prompt = <<EOF
You are a strict evaluator. You will be given a request and a response.
Your task is to judge whether the response is correct or incorrect. You must
assign a score between 1 and 100, where: 100 represents a completely correct
and ideal response, 1 represents a completely incorrect or irrelevant response.
Your score must be a single number only — no text, labels, or explanations.
Use the full range of values (e.g., 13, 47, 86), not just round numbers like
10, 50, or 100. Be accurate and consistent, as this score will be used by another
model for learning and evaluation.
EOF
http_timeout = 60000
https_verify = true
ignore_assistant_prompts = true
ignore_system_prompts = true
ignore_tool_prompts = true
sampling_rate = 1
llm = {
auth = {
allow_override = false
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
logging = {
log_payloads = true
log_statistics = true
}
model = {
name = "gpt-4o"
provider = "openai"
options = {
temperature = 2
max_tokens = 5
top_p = 1
cohere = {
embedding_input_type = "classification"
}
}
}
route_type = "llm/v1/chat"
}
message_countback = 3
}
tags = []
control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
consumer_group = {
id = konnect_gateway_consumer_group.my_consumer_group.id
}
}
This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value.
variable "openai_api_key" {
type = string
}