Send asynchronous requests to LLMs

Uses: Kong Gateway AI Gateway deck
Tags
Related Resources
Minimum Version
Kong Gateway - 3.11
TL;DR

Upload a batch file in JSONL format to the /files Route, then create a batch request via the /batches Route to process multiple LLM queries asynchronously, and finally retrieve the batched responses from the /files Route. Batching requests allows you to reduce LLM usage costs by:

  • Minimizing per-request overhead
  • Avoiding rate-limit penalties
  • Enabling efficient model usage
  • Reducing wasted retries

Prerequisites

This is a Konnect tutorial and requires a Konnect personal access token.

  1. Create a new personal access token by opening the Konnect PAT page and selecting Generate Token.

  2. Export your token to an environment variable:

     export KONNECT_TOKEN='YOUR_KONNECT_PAT'
    
  3. Run the quickstart script to automatically provision a Control Plane and Data Plane, and configure your environment:

     curl -Ls https://get.konghq.com/quickstart | bash -s -- -k $KONNECT_TOKEN --deck-output
    

    This sets up a Konnect Control Plane named quickstart, provisions a local Data Plane, and prints out the following environment variable exports:

     export DECK_KONNECT_TOKEN=$KONNECT_TOKEN
     export DECK_KONNECT_CONTROL_PLANE_NAME=quickstart
     export KONNECT_CONTROL_PLANE_URL=https://us.api.konghq.com
     export KONNECT_PROXY_URL='http://localhost:8000'
    

    Copy and paste these into your terminal to configure your session.

This tutorial requires Kong Gateway Enterprise. If you don’t have Kong Gateway set up yet, you can use the quickstart script with an enterprise license to get an instance of Kong Gateway running almost instantly.

  1. Export your license to an environment variable:

     export KONG_LICENSE_DATA='LICENSE-CONTENTS-GO-HERE'
    
  2. Run the quickstart script:

    curl -Ls https://get.konghq.com/quickstart | bash -s -- -e KONG_LICENSE_DATA 
    

    Once Kong Gateway is ready, you will see the following message:

     Kong Gateway Ready
    

decK is a CLI tool for managing Kong Gateway declaratively with state files. To complete this tutorial you will first need to install decK.

For this tutorial, you’ll need Kong Gateway entities, like Gateway Services and Routes, pre-configured. These entities are essential for Kong Gateway to function but installing them isn’t the focus of this guide. Follow these steps to pre-configure them:

  1. Run the following command:

    echo '
    _format_version: "3.0"
    services:
      - name: files-service
        url: http://httpbin.konghq.com/files
      - name: batches-service
        url: http://httpbin.konghq.com/batches
    routes:
      - name: files-route
        paths:
        - "/files"
        service:
          name: files-service
      - name: batches-route
        paths:
        - "/batches"
        service:
          name: batches-service
    ' | deck gateway apply -
    

To learn more about entities, you can read our entities documentation.

This tutorial uses OpenAI:

  1. Create an OpenAI account.
  2. Get an API key.
  3. Create a decK variable with the API key:
export DECK_OPENAI_API_KEY="YOUR OPENAI API KEY"
export DECK_OPENAI_API_KEY="YOUR OPENAI API KEY"

To complete this tutorial, create a batch.jsonl to generate asynchronous batched LLM responses. We use /v1/chat/completions because it handles chat-based generation requests, enabling the LLM to produce conversational completions in batch mode.

Run the following command to create the file:

cat <<EOF > batch.jsonl
{"custom_id": "prod1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Write a compelling product description for a stainless steel water bottle suitable for outdoor activities."}], "max_tokens": 60}}
{"custom_id": "prod2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Write a product description for a pair of wireless noise-cancelling headphones with long battery life."}], "max_tokens": 60}}
{"custom_id": "prod3", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Write an engaging product description for a stylish red leather wallet with multiple compartments."}], "max_tokens": 60}}
{"custom_id": "prod4", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Write a detailed product description for a Bluetooth wireless speaker with waterproof features."}], "max_tokens": 60}}
{"custom_id": "prod5", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Write a concise product description for a compact and durable travel backpack with laptop compartment."}], "max_tokens": 60}}
EOF

Configure AI Proxy plugins

Configure two separate AI Proxy plugins: one for the llm/v1/files Route and another for the llm/v1/batches Route. Each Route type requires its own dedicated Gateway Service and Route to function correctly. In this setup, all requests to the files Route are forwarded to /files endpoint, while batch requests go to /batches endpoint.

AI Proxy plugin for the route_type: llm/v1/files :

echo '
_format_version: "3.0"
plugins:
  - name: ai-proxy
    service: files-service
    config:
      model_name_header: false
      route_type: llm/v1/files
      auth:
        header_name: Authorization
        header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
      model:
        provider: openai
' | deck gateway apply -

AI Proxy plugin for the route_type: llm/v1/batches:

echo '
_format_version: "3.0"
plugins:
  - name: ai-proxy
    service: batches-service
    config:
      model_name_header: false
      route_type: llm/v1/batches
      auth:
        header_name: Authorization
        header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
      model:
        provider: openai
' | deck gateway apply -

Upload a .jsonl file for batching

Use the following command to upload your batching file to the /files route:

curl localhost:8000/files -F purpose="batch" -F file="@batch.jsonl"

You will see a JSON response like this:

{
  "object": "file",
  "id": "file-abc123xyz456789lmn0pq",
  "purpose": "batch",
  "filename": "1.jsonl",
  "bytes": 1672,
  "created_at": 1751281528,
  "expires_at": null,
  "status": "processed",
  "status_details": null
}

Copy the file ID from the response, you will need it to create a batch. Export it as an environment variable:

export FILE_ID=YOUR_FILE_ID

Create a batching request

Send a POST request to the /batches Route to create a batch using your uploaded file:

The completion window must be set to 24h, as it’s the only value currently supported by the OpenAI /batches API.

In this example we use the /v1/chat/completions route for batching because we are sending multiple structured chat-style prompts in OpenAI’s chat completions format to be processed in bulk.

curl http://localhost:8000/batches \
  -H "Content-Type: application/json" \
  -d "{
    \"input_file_id\": \"$FILE_ID\",
    \"endpoint\": \"/v1/chat/completions\",
    \"completion_window\": \"24h\"
  }"

You will receive a response similar to:

{
  "id": "batch_d41d8cd98f00b204e9800998ecf8427e",
  "object": "batch",
  "endpoint": "/v1/chat/completions",
  "errors": null,
  "input_file_id": "file-TgJnwX6nHPPvb5W4abcdef",
  "completion_window": "24h",
  "status": "validating",
  "output_file_id": null,
  "error_file_id": null,
  "created_at": 1751281814,
  "in_progress_at": null,
  "expires_at": 1751368214,
  "finalizing_at": null,
  "completed_at": null,
  "failed_at": null,
  "expired_at": null,
  "cancelling_at": null,
  "cancelled_at": null,
  "request_counts": {
    "total": 0,
    "completed": 0,
    "failed": 0
  },
  "metadata": null
}

Copy the batch ID from this response to check the batch status and export it as an environment variable by running the following command in your terminal:

export BATCH_ID=YOUR_BATCH_ID

Check batching status

Wait for a moment for the batching request to be completed, then check the status of your batch by sending the following request:

 curl "$KONNECT_PROXY_URL/batches/$BATCH_ID"
 curl "http://localhost:8000/batches/$BATCH_ID"

A completed batch response looks like this:

{
  "id": "batch_a1b2c3d4e5f60789abcdef0123456789",
  "object": "batch",
  "endpoint": "/v1/chat/completions",
  "errors": null,
  "input_file_id": "file-XyZ123abc456Def789Ghij",
  "completion_window": "24h",
  "status": "completed",
  "output_file_id": "file-Lmn987Qrs654Tuv321Wxyz",
  "error_file_id": null,
  "created_at": 1751281998,
  "in_progress_at": 1751281999,
  "expires_at": 1751368398,
  "finalizing_at": 1751282173,
  "completed_at": 1751282174,
  "failed_at": null,
  "expired_at": null,
  "cancelling_at": null,
  "cancelled_at": null,
  "request_counts": {
    "total": 5,
    "completed": 5,
    "failed": 0
  },
  "metadata": null
}

You can notice The "request_counts" object shows that all five requests in the batch were successfully completed ("completed": 5, "failed": 0).

Now, you can copy the output_file_id to retrieve your batched responses and export it as environment variable:

export OUTPUT_FILE_ID=YOUR_OUTPUT_FILE_ID

The output file ID will only be available once the batch request has completed. If the status is "in_progress", it won’t be set yet.

Retrieve batched responses

Now, we can download the batched responses from the /files endpoint by appending /content to the file ID URL. For details, see the OpenAI API documentation.

curl http://localhost:8000/files/$OUTPUT_FILE_ID/content > batched-response.jsonl

This command saves the batched responses to the batched-response.jsonl file.

The batched response file contains one JSON object per line, each representing a single batched request’s response. Here is an example of content from batched-response.jsonl which contains the individual completion results for each request we submitted in the batch input file:

{"id": "batch_req_686271fdfdd88190afc7c1da9a67f59f", "custom_id": "prod1", "response": {"status_code": 200, "request_id": "31043970a729289021c4de02f4d9d4f4", "body": {"id": "chatcmpl-Bo6lqlrGydPEceKXlWmh0gYIGpA4o", "object": "chat.completion", "created": 1751282126, "model": "gpt-4o-mini-2024-07-18", "choices": [{"index": 0, "message": {"role": "assistant", "content": "**Elevate Your Hydration Game: The Ultimate Stainless Steel Water Bottle**\n\nIntroducing the **AdventureHydrate Stainless Steel Water Bottle** — your perfect companion for all outdoor adventures! Whether you're hiking rugged trails, camping under the stars, or simply enjoying a day at the beach, this water bottle is designed", "refusal": null, "annotations": []}, "logprobs": null, "finish_reason": "length"}], "usage": {"prompt_tokens": 33, "completion_tokens": 60, "total_tokens": 93, "prompt_tokens_details": {"cached_tokens": 0, "audio_tokens": 0}, "completion_tokens_details": {"reasoning_tokens": 0, "audio_tokens": 0, "accepted_prediction_tokens": 0, "rejected_prediction_tokens": 0}}, "service_tier": "default", "system_fingerprint": "fp_34a54ae93c"}}, "error": null}
{"id": "batch_req_686271fe13148190b00f0d8d4a237e0c", "custom_id": "prod2", "response": {"status_code": 200, "request_id": "75e72b39c1e25a076486ad0a56ef9040", "body": {"id": "chatcmpl-Bo6jypac8GcC4dEE91NiERhqbI68M", "object": "chat.completion", "created": 1751282010, "model": "gpt-4o-mini-2024-07-18", "choices": [{"index": 0, "message": {"role": "assistant", "content": "**Product Description: NoiseBlock Pro Wireless Noise-Cancelling Headphones**\n\nExperience the ultimate in sound clarity and comfort with the NoiseBlock Pro Wireless Noise-Cancelling Headphones. Designed for audiophiles and casual listeners alike, these state-of-the-art headphones combine advanced noise-cancellation technology with an", "refusal": null, "annotations": []}, "logprobs": null, "finish_reason": "length"}], "usage": {"prompt_tokens": 36, "completion_tokens": 60, "total_tokens": 96, "prompt_tokens_details": {"cached_tokens": 0, "audio_tokens": 0}, "completion_tokens_details": {"reasoning_tokens": 0, "audio_tokens": 0, "accepted_prediction_tokens": 0, "rejected_prediction_tokens": 0}}, "service_tier": "default", "system_fingerprint": "fp_34a54ae93c"}}, "error": null}
{"id": "batch_req_686271fe20d48190acc5b34cb9a3dca9", "custom_id": "prod3", "response": {"status_code": 200, "request_id": "4e27db53d730a1404b1f43953f6191e5", "body": {"id": "chatcmpl-Bo6k2pEvK0tTUmjvdQ3H1ysGnCn9d", "object": "chat.completion", "created": 1751282014, "model": "gpt-4o-mini-2024-07-18", "choices": [{"index": 0, "message": {"role": "assistant", "content": "### Elevate Your Everyday with the Red Luxe Leather Wallet\n\nStep into sophistication with our stunning Red Luxe Leather Wallet, where style meets functionality in perfect harmony. Crafted from premium, supple leather, this wallet boasts a rich, vibrant hue that adds a bold statement to any ensemble. \n\n**Features:**\n", "refusal": null, "annotations": []}, "logprobs": null, "finish_reason": "length"}], "usage": {"prompt_tokens": 32, "completion_tokens": 60, "total_tokens": 92, "prompt_tokens_details": {"cached_tokens": 0, "audio_tokens": 0}, "completion_tokens_details": {"reasoning_tokens": 0, "audio_tokens": 0, "accepted_prediction_tokens": 0, "rejected_prediction_tokens": 0}}, "service_tier": "default", "system_fingerprint": "fp_62a23a81ef"}}, "error": null}
{"id": "batch_req_686271fe2f14819099e646c0c43c364c", "custom_id": "prod4", "response": {"status_code": 200, "request_id": "1c26a143c432ee43e36a7fb302d56a89", "body": {"id": "chatcmpl-Bo6k8mCzyUcgZNWEAEL6LzBdmuaIy", "object": "chat.completion", "created": 1751282020, "model": "gpt-4o-mini-2024-07-18", "choices": [{"index": 0, "message": {"role": "assistant", "content": "**Product Description: Wireless Waterproof Bluetooth Speaker**\n\n**Elevate Your Sound Experience Anywhere!**\n\nIntroducing the Ultimate Wireless Waterproof Bluetooth Speaker, designed for the adventurer in you! Whether you're lounging by the pool, trekking in the mountains, or hosting a beach party, this speaker combines impressive audio quality with robust", "refusal": null, "annotations": []}, "logprobs": null, "finish_reason": "length"}], "usage": {"prompt_tokens": 31, "completion_tokens": 60, "total_tokens": 91, "prompt_tokens_details": {"cached_tokens": 0, "audio_tokens": 0}, "completion_tokens_details": {"reasoning_tokens": 0, "audio_tokens": 0, "accepted_prediction_tokens": 0, "rejected_prediction_tokens": 0}}, "service_tier": "default", "system_fingerprint": "fp_34a54ae93c"}}, "error": null}
{"id": "batch_req_686271fe3c108190bdd6a64f7231191a", "custom_id": "prod5", "response": {"status_code": 200, "request_id": "3613bb32e5afef94cab0ad41c19ee2dc", "body": {"id": "chatcmpl-Bo6jwAbdiD35WsrppVDcIR15yJQNr", "object": "chat.completion", "created": 1751282008, "model": "gpt-4o-mini-2024-07-18", "choices": [{"index": 0, "message": {"role": "assistant", "content": "Discover the ultimate travel companion with our Compact and Durable Travel Backpack. Designed for the modern traveler, this sleek backpack features a padded laptop compartment that securely fits devices up to 15.6 inches, ensuring your tech stays safe on the go. Crafted from high-quality, water-resistant materials, it withstands", "refusal": null, "annotations": []}, "logprobs": null, "finish_reason": "length"}], "usage": {"prompt_tokens": 33, "completion_tokens": 60, "total_tokens": 93, "prompt_tokens_details": {"cached_tokens": 0, "audio_tokens": 0}, "completion_tokens_details": {"reasoning_tokens": 0, "audio_tokens": 0, "accepted_prediction_tokens": 0, "rejected_prediction_tokens": 0}}, "service_tier": "default", "system_fingerprint": "fp_34a54ae93c"}}, "error": null}

Cleanup

If you created a new control plane and want to conserve your free trial credits or avoid unnecessary charges, delete the new control plane used in this tutorial.

curl -Ls https://get.konghq.com/quickstart | bash -s -- -d
Something wrong?

Help us make these docs great!

Kong Developer docs are open source. If you find these useful and want to make them better, contribute today!