- The plugin sends the user prompt and response to the configured LLM as a judge.
- The LLM evaluates the response and returns a numeric score between
1 (ideal) and 100 (wrong or irrelevant).
- This score can be used in downstream workflows, such as automated grading, feedback systems, or learning pipelines.
The following sequence diagram illustrates this simplified flow:
sequenceDiagram
actor Client
participant AIP as AI Proxy Advanced
participant LLM as LLM Model (A or B)
participant Judge as AI LLM as Judge
participant JudgeLLM as Judge LLM
Client->>AIP: Send prompt
AIP->>LLM: Forward prompt (balancer selects model)
LLM-->>AIP: Response
AIP->>Judge: Prompt + response
Judge->>JudgeLLM: Evaluate response
JudgeLLM-->>Judge: Score (1–100)
Judge-->>AIP: Evaluation result
AIP-->>Client: Response