Send asynchronous requests to LLMs
Upload a batch file in JSONL format to the /files
Route, then create a batch request via the /batches
Route to process multiple LLM queries asynchronously, and finally retrieve the batched responses from the /files
Route. Batching requests allows you to reduce LLM usage costs by:
- Minimizing per-request overhead
- Avoiding rate-limit penalties
- Enabling efficient model usage
- Reducing wasted retries
Prerequisites
Kong Konnect
This is a Konnect tutorial and requires a Konnect personal access token.
-
Create a new personal access token by opening the Konnect PAT page and selecting Generate Token.
-
Export your token to an environment variable:
export KONNECT_TOKEN='YOUR_KONNECT_PAT'
-
Run the quickstart script to automatically provision a Control Plane and Data Plane, and configure your environment:
curl -Ls https://get.konghq.com/quickstart | bash -s -- -k $KONNECT_TOKEN --deck-output
This sets up a Konnect Control Plane named
quickstart
, provisions a local Data Plane, and prints out the following environment variable exports:export DECK_KONNECT_TOKEN=$KONNECT_TOKEN export DECK_KONNECT_CONTROL_PLANE_NAME=quickstart export KONNECT_CONTROL_PLANE_URL=https://us.api.konghq.com export KONNECT_PROXY_URL='http://localhost:8000'
Copy and paste these into your terminal to configure your session.
Kong Gateway running
This tutorial requires Kong Gateway Enterprise. If you don’t have Kong Gateway set up yet, you can use the quickstart script with an enterprise license to get an instance of Kong Gateway running almost instantly.
-
Export your license to an environment variable:
export KONG_LICENSE_DATA='LICENSE-CONTENTS-GO-HERE'
-
Run the quickstart script:
curl -Ls https://get.konghq.com/quickstart | bash -s -- -e KONG_LICENSE_DATA
Once Kong Gateway is ready, you will see the following message:
Kong Gateway Ready
decK
decK is a CLI tool for managing Kong Gateway declaratively with state files. To complete this tutorial you will first need to install decK.
Required entities
For this tutorial, you’ll need Kong Gateway entities, like Gateway Services and Routes, pre-configured. These entities are essential for Kong Gateway to function but installing them isn’t the focus of this guide. Follow these steps to pre-configure them:
-
Run the following command:
echo ' _format_version: "3.0" services: - name: files-service url: http://httpbin.konghq.com/files - name: batches-service url: http://httpbin.konghq.com/batches routes: - name: files-route paths: - "/files" service: name: files-service - name: batches-route paths: - "/batches" service: name: batches-service ' | deck gateway apply -
To learn more about entities, you can read our entities documentation.
OpenAI
This tutorial uses OpenAI:
- Create an OpenAI account.
- Get an API key.
- Create a decK variable with the API key:
export DECK_OPENAI_API_KEY="YOUR OPENAI API KEY"
export DECK_OPENAI_API_KEY="YOUR OPENAI API KEY"
Batch .jsonl file
To complete this tutorial, create a batch.jsonl
to generate asynchronous batched LLM responses. We use /v1/chat/completions
because it handles chat-based generation requests, enabling the LLM to produce conversational completions in batch mode.
Run the following command to create the file:
cat <<EOF > batch.jsonl
{"custom_id": "prod1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Write a compelling product description for a stainless steel water bottle suitable for outdoor activities."}], "max_tokens": 60}}
{"custom_id": "prod2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Write a product description for a pair of wireless noise-cancelling headphones with long battery life."}], "max_tokens": 60}}
{"custom_id": "prod3", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Write an engaging product description for a stylish red leather wallet with multiple compartments."}], "max_tokens": 60}}
{"custom_id": "prod4", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Write a detailed product description for a Bluetooth wireless speaker with waterproof features."}], "max_tokens": 60}}
{"custom_id": "prod5", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Write a concise product description for a compact and durable travel backpack with laptop compartment."}], "max_tokens": 60}}
EOF
Configure AI Proxy plugins
Configure two separate AI Proxy plugins: one for the llm/v1/files
Route and another for the llm/v1/batches
Route. Each Route type requires its own dedicated Gateway Service and Route to function correctly. In this setup, all requests to the files Route are forwarded to /files
endpoint, while batch requests go to /batches
endpoint.
AI Proxy plugin for the route_type: llm/v1/files
:
echo '
_format_version: "3.0"
plugins:
- name: ai-proxy
service: files-service
config:
model_name_header: false
route_type: llm/v1/files
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
model:
provider: openai
' | deck gateway apply -
AI Proxy plugin for the route_type: llm/v1/batches
:
echo '
_format_version: "3.0"
plugins:
- name: ai-proxy
service: batches-service
config:
model_name_header: false
route_type: llm/v1/batches
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
model:
provider: openai
' | deck gateway apply -
Upload a .jsonl file for batching
Use the following command to upload your batching file to the /files
route:
curl localhost:8000/files -F purpose="batch" -F file="@batch.jsonl"
You will see a JSON response like this:
{
"object": "file",
"id": "file-abc123xyz456789lmn0pq",
"purpose": "batch",
"filename": "1.jsonl",
"bytes": 1672,
"created_at": 1751281528,
"expires_at": null,
"status": "processed",
"status_details": null
}
Copy the file ID from the response, you will need it to create a batch. Export it as an environment variable:
export FILE_ID=YOUR_FILE_ID
Create a batching request
Send a POST request to the /batches
Route to create a batch using your uploaded file:
The completion window must be set to
24h
, as it’s the only value currently supported by the OpenAI/batches
API.In this example we use the
/v1/chat/completions
route for batching because we are sending multiple structured chat-style prompts in OpenAI’s chat completions format to be processed in bulk.
curl http://localhost:8000/batches \
-H "Content-Type: application/json" \
-d "{
\"input_file_id\": \"$FILE_ID\",
\"endpoint\": \"/v1/chat/completions\",
\"completion_window\": \"24h\"
}"
You will receive a response similar to:
{
"id": "batch_d41d8cd98f00b204e9800998ecf8427e",
"object": "batch",
"endpoint": "/v1/chat/completions",
"errors": null,
"input_file_id": "file-TgJnwX6nHPPvb5W4abcdef",
"completion_window": "24h",
"status": "validating",
"output_file_id": null,
"error_file_id": null,
"created_at": 1751281814,
"in_progress_at": null,
"expires_at": 1751368214,
"finalizing_at": null,
"completed_at": null,
"failed_at": null,
"expired_at": null,
"cancelling_at": null,
"cancelled_at": null,
"request_counts": {
"total": 0,
"completed": 0,
"failed": 0
},
"metadata": null
}
Copy the batch ID from this response to check the batch status and export it as an environment variable by running the following command in your terminal:
export BATCH_ID=YOUR_BATCH_ID
Check batching status
Wait for a moment for the batching request to be completed, then check the status of your batch by sending the following request:
curl "$KONNECT_PROXY_URL/batches/$BATCH_ID"
curl "http://localhost:8000/batches/$BATCH_ID"
A completed batch response looks like this:
{
"id": "batch_a1b2c3d4e5f60789abcdef0123456789",
"object": "batch",
"endpoint": "/v1/chat/completions",
"errors": null,
"input_file_id": "file-XyZ123abc456Def789Ghij",
"completion_window": "24h",
"status": "completed",
"output_file_id": "file-Lmn987Qrs654Tuv321Wxyz",
"error_file_id": null,
"created_at": 1751281998,
"in_progress_at": 1751281999,
"expires_at": 1751368398,
"finalizing_at": 1751282173,
"completed_at": 1751282174,
"failed_at": null,
"expired_at": null,
"cancelling_at": null,
"cancelled_at": null,
"request_counts": {
"total": 5,
"completed": 5,
"failed": 0
},
"metadata": null
}
You can notice The "request_counts"
object shows that all five requests in the batch were successfully completed ("completed": 5
, "failed": 0
).
Now, you can copy the output_file_id
to retrieve your batched responses and export it as environment variable:
export OUTPUT_FILE_ID=YOUR_OUTPUT_FILE_ID
The output file ID will only be available once the batch request has completed. If the status is "in_progress"
, it won’t be set yet.
Retrieve batched responses
Now, we can download the batched responses from the /files
endpoint by appending /content
to the file ID URL. For details, see the OpenAI API documentation.
curl http://localhost:8000/files/$OUTPUT_FILE_ID/content > batched-response.jsonl
This command saves the batched responses to the batched-response.jsonl
file.
The batched response file contains one JSON object per line, each representing a single batched request’s response. Here is an example of content from batched-response.jsonl
which contains the individual completion results for each request we submitted in the batch input file:
{"id": "batch_req_686271fdfdd88190afc7c1da9a67f59f", "custom_id": "prod1", "response": {"status_code": 200, "request_id": "31043970a729289021c4de02f4d9d4f4", "body": {"id": "chatcmpl-Bo6lqlrGydPEceKXlWmh0gYIGpA4o", "object": "chat.completion", "created": 1751282126, "model": "gpt-4o-mini-2024-07-18", "choices": [{"index": 0, "message": {"role": "assistant", "content": "**Elevate Your Hydration Game: The Ultimate Stainless Steel Water Bottle**\n\nIntroducing the **AdventureHydrate Stainless Steel Water Bottle** — your perfect companion for all outdoor adventures! Whether you're hiking rugged trails, camping under the stars, or simply enjoying a day at the beach, this water bottle is designed", "refusal": null, "annotations": []}, "logprobs": null, "finish_reason": "length"}], "usage": {"prompt_tokens": 33, "completion_tokens": 60, "total_tokens": 93, "prompt_tokens_details": {"cached_tokens": 0, "audio_tokens": 0}, "completion_tokens_details": {"reasoning_tokens": 0, "audio_tokens": 0, "accepted_prediction_tokens": 0, "rejected_prediction_tokens": 0}}, "service_tier": "default", "system_fingerprint": "fp_34a54ae93c"}}, "error": null}
{"id": "batch_req_686271fe13148190b00f0d8d4a237e0c", "custom_id": "prod2", "response": {"status_code": 200, "request_id": "75e72b39c1e25a076486ad0a56ef9040", "body": {"id": "chatcmpl-Bo6jypac8GcC4dEE91NiERhqbI68M", "object": "chat.completion", "created": 1751282010, "model": "gpt-4o-mini-2024-07-18", "choices": [{"index": 0, "message": {"role": "assistant", "content": "**Product Description: NoiseBlock Pro Wireless Noise-Cancelling Headphones**\n\nExperience the ultimate in sound clarity and comfort with the NoiseBlock Pro Wireless Noise-Cancelling Headphones. Designed for audiophiles and casual listeners alike, these state-of-the-art headphones combine advanced noise-cancellation technology with an", "refusal": null, "annotations": []}, "logprobs": null, "finish_reason": "length"}], "usage": {"prompt_tokens": 36, "completion_tokens": 60, "total_tokens": 96, "prompt_tokens_details": {"cached_tokens": 0, "audio_tokens": 0}, "completion_tokens_details": {"reasoning_tokens": 0, "audio_tokens": 0, "accepted_prediction_tokens": 0, "rejected_prediction_tokens": 0}}, "service_tier": "default", "system_fingerprint": "fp_34a54ae93c"}}, "error": null}
{"id": "batch_req_686271fe20d48190acc5b34cb9a3dca9", "custom_id": "prod3", "response": {"status_code": 200, "request_id": "4e27db53d730a1404b1f43953f6191e5", "body": {"id": "chatcmpl-Bo6k2pEvK0tTUmjvdQ3H1ysGnCn9d", "object": "chat.completion", "created": 1751282014, "model": "gpt-4o-mini-2024-07-18", "choices": [{"index": 0, "message": {"role": "assistant", "content": "### Elevate Your Everyday with the Red Luxe Leather Wallet\n\nStep into sophistication with our stunning Red Luxe Leather Wallet, where style meets functionality in perfect harmony. Crafted from premium, supple leather, this wallet boasts a rich, vibrant hue that adds a bold statement to any ensemble. \n\n**Features:**\n", "refusal": null, "annotations": []}, "logprobs": null, "finish_reason": "length"}], "usage": {"prompt_tokens": 32, "completion_tokens": 60, "total_tokens": 92, "prompt_tokens_details": {"cached_tokens": 0, "audio_tokens": 0}, "completion_tokens_details": {"reasoning_tokens": 0, "audio_tokens": 0, "accepted_prediction_tokens": 0, "rejected_prediction_tokens": 0}}, "service_tier": "default", "system_fingerprint": "fp_62a23a81ef"}}, "error": null}
{"id": "batch_req_686271fe2f14819099e646c0c43c364c", "custom_id": "prod4", "response": {"status_code": 200, "request_id": "1c26a143c432ee43e36a7fb302d56a89", "body": {"id": "chatcmpl-Bo6k8mCzyUcgZNWEAEL6LzBdmuaIy", "object": "chat.completion", "created": 1751282020, "model": "gpt-4o-mini-2024-07-18", "choices": [{"index": 0, "message": {"role": "assistant", "content": "**Product Description: Wireless Waterproof Bluetooth Speaker**\n\n**Elevate Your Sound Experience Anywhere!**\n\nIntroducing the Ultimate Wireless Waterproof Bluetooth Speaker, designed for the adventurer in you! Whether you're lounging by the pool, trekking in the mountains, or hosting a beach party, this speaker combines impressive audio quality with robust", "refusal": null, "annotations": []}, "logprobs": null, "finish_reason": "length"}], "usage": {"prompt_tokens": 31, "completion_tokens": 60, "total_tokens": 91, "prompt_tokens_details": {"cached_tokens": 0, "audio_tokens": 0}, "completion_tokens_details": {"reasoning_tokens": 0, "audio_tokens": 0, "accepted_prediction_tokens": 0, "rejected_prediction_tokens": 0}}, "service_tier": "default", "system_fingerprint": "fp_34a54ae93c"}}, "error": null}
{"id": "batch_req_686271fe3c108190bdd6a64f7231191a", "custom_id": "prod5", "response": {"status_code": 200, "request_id": "3613bb32e5afef94cab0ad41c19ee2dc", "body": {"id": "chatcmpl-Bo6jwAbdiD35WsrppVDcIR15yJQNr", "object": "chat.completion", "created": 1751282008, "model": "gpt-4o-mini-2024-07-18", "choices": [{"index": 0, "message": {"role": "assistant", "content": "Discover the ultimate travel companion with our Compact and Durable Travel Backpack. Designed for the modern traveler, this sleek backpack features a padded laptop compartment that securely fits devices up to 15.6 inches, ensuring your tech stays safe on the go. Crafted from high-quality, water-resistant materials, it withstands", "refusal": null, "annotations": []}, "logprobs": null, "finish_reason": "length"}], "usage": {"prompt_tokens": 33, "completion_tokens": 60, "total_tokens": 93, "prompt_tokens_details": {"cached_tokens": 0, "audio_tokens": 0}, "completion_tokens_details": {"reasoning_tokens": 0, "audio_tokens": 0, "accepted_prediction_tokens": 0, "rejected_prediction_tokens": 0}}, "service_tier": "default", "system_fingerprint": "fp_34a54ae93c"}}, "error": null}
Cleanup
Clean up Konnect environment
If you created a new control plane and want to conserve your free trial credits or avoid unnecessary charges, delete the new control plane used in this tutorial.
Destroy the Kong Gateway container
curl -Ls https://get.konghq.com/quickstart | bash -s -- -d