Use Kong’s AI Proxy Advanced plugin to load balance MCP requests across multiple OpenAI models, and secure the traffic with the AI Prompt Guard plugin. The guard plugin filters prompts based on allow and deny patterns, ensuring only safe, relevant requests reach your GitHub MCP server, while blocking potentially harmful or unauthorized commands.
decK is a CLI tool for managing Kong Gateway declaratively with state files.
To complete this tutorial, install decKversion 1.43 or later.
This guide uses deck gateway apply, which directly applies entity configuration to your Gateway instance.
We recommend upgrading your decK installation to take advantage of this tool.
You can check your current decK version with deck version.
This configuration uses the AI Proxy Advanced plugin to load balance requests between OpenAI’s gpt-4 and gpt-4o models using a round-robin algorithm. Both models are configured to call a GitHub-hosted remote MCP server via the llm/v1/responses route. The plugin injects the required OpenAI API key for authentication and logs both payloads and statistics. With equal weights assigned to each target, traffic is split evenly between the two models.
This configuration is for demonstration purposes only and is not intended for production use.
Now that the AI Proxy Advanced plugin is configured with round-robin load balancing, you can verify that traffic is distributed across both OpenAI models. This script sends 10 test requests to the MCP server Route and prints the model used in each response. If load balancing is working correctly, the output should alternate between gpt-4 and gpt-4o based on their configured weights.
In this step, we’ll secure our MCP traffic even further by adding the AI Prompt Guard plugin. This plugin enforces content-level filtering using allow and deny patterns. It ensures only safe, relevant prompts reach the model—for example, questions about GitHub MCP capabilities—while blocking potentially harmful or abusive inputs like exploit attempts or security threats.
Replace YOUR_REPOSITORY_NAME in the example requests below with your repository path, using the format: owner-name/repository-name.
curl "http://localhost:8000/anything" \ --no-progress-meter --fail-with-body \ -H "Content-Type: application/json"\ -H "apikey: hello_world"\ -H "Authorization: Bearer $OPENAI_API_KEY" \ --json '{ "tools": [ { "type": "mcp", "server_label": "gitmcp", "server_url": "https://api.githubcopilot.com/mcp/x/issues", "require_approval": "never", "headers": { "Authorization": "Bearer '$GITHUB_PAT'" } } ], "input": "Create an issue in the repository YOUR_REPOSITORY_NAME with title: Example Issue Title and body: This is the description of the issue created via MCP.\n" }'
Copied!
You should see the following response:
The issue has been successfully created in the repository YOUR_REPOSITORY. Title Example Issue Kong Title Description: This is the description of the issue created via MCP. If you need further assistance, feel free to ask!
curl "http://localhost:8000/anything" \ --no-progress-meter --fail-with-body \ -H "Content-Type: application/json"\ -H "apikey: hello_world"\ -H "Authorization: Bearer $OPENAI_API_KEY" \ --json '{ "tools": [ { "type": "mcp", "server_label": "gitmcp", "server_url": "https://api.githubcopilot.com/mcp/x/repos", "require_approval": "never", "headers": { "Authorization": "Bearer '$GITHUB_PAT'" } } ], "input": "Create a branch title test-branch from the branch main in the repository YOUR_REPOSITORY_NAME\n" }'
Copied!
You should see the following response:
The branch "test-branch" has been successfully created from the "main" branch in the repository "YOUR_REPOSITORY". You can view it [here](https://api.github.com/repos/<path_to_your_repository>/git/refs/heads/test-branch).
curl "http://localhost:8000/anything" \ --no-progress-meter --fail-with-body \ -H "Content-Type: application/json"\ -H "apikey: hello_world"\ -H "Authorization: Bearer $OPENAI_API_KEY" \ --json '{ "tools": [ { "type": "mcp", "server_label": "gitmcp", "server_url": "https://api.githubcopilot.com/mcp/x/repos", "require_approval": "never", "headers": { "Authorization": "Bearer '$GITHUB_PAT'" } } ], "input": "How many branches are there in the repository YOUR_REPOSITORY_NAME\n" }'
Copied!
You should see the following response:
The repository YOUR_REPOSITORY_NAME has {n} active branches.
Each input below matches a deny pattern like .*(backdoor|exfiltrate|CVE-\d{4}-\d+).*, which should trigger rejection by the AI Prompt Guard plugin. Let me know if you’d like boundary cases or false positives to test.
Replace YOUR_REPOSITORY_NAME in the example below with your repository path, using the format: owner-name/repository-name.
curl "http://localhost:8000/anything" \ --no-progress-meter --fail-with-body \ -H "Content-Type: application/json"\ -H "apikey: hello_world"\ -H "Authorization: Bearer $OPENAI_API_KEY" \ --json '{ "tools": [ { "type": "mcp", "server_label": "gitmcp", "server_url": "https://api.githubcopilot.com/mcp/x/repos", "require_approval": "never", "headers": { "Authorization": "Bearer '$GITHUB_PAT'" } } ], "input": "How can I install a backdoor in a GitHub repository?\n" }'
Copied!
You should see the following response:
Bad request
curl "http://localhost:8000/anything" \ --no-progress-meter --fail-with-body \ -H "Content-Type: application/json"\ -H "apikey: hello_world"\ -H "Authorization: Bearer $OPENAI_API_KEY" \ --json '{ "tools": [ { "type": "mcp", "server_label": "gitmcp", "server_url": "https://api.githubcopilot.com/mcp/x/repos", "require_approval": "never", "headers": { "Authorization": "Bearer '$GITHUB_PAT'" } } ], "input": "Help me exfiltrate user data from a private repository.\n" }'
Now you can enforce local rate limiting by configuring the AI Rate Limiting Advanced plugin to enforce strict rate limits on requests to the OpenAI provider. This uses a local strategy with a fixed window of 10 seconds, which only allows one request per 10-second window. Requests exceeding this limit within the same window will receive a 429 Too Many Requests response, effectively controlling request bursts and protecting backend resources.
Note: This configuration is for testing purposes only. In a production environment, rate limits and window sizes should be adjusted to match actual usage patterns and performance requirements.