Blog/Tags/llm-as-judge

llm-as-judge

Browse 2 articles tagged with “llm-as-judge”.

Articles tagged “llm-as-judge”

2 articles

Three glowing rubric cards floating in misted air, each marking the same transcript with subtly different ink colors, with a faint kappa heatmap projected on the wall behind them

Testing & Evaluation·11 min read

GPT-5, Claude 4.5, Gemini Score the Same Calls. Their Kappa Is 0.52

Run the same calls through GPT-5, Claude 4.5, and Gemini and Cohen's kappa lands at 0.52. Here is how to measure judge agreement on your own corpus.

AI-generated illustration for agent eval no ground truth -- Soul (2020) style, Terra Cotta palette

Testing & Evaluation·14 min read

How to Eval Agents When There's No Right Answer

Most eval methods assume you know the correct response. CX agents rarely have one. Here's how to score agent quality with criteria-based rubrics and LLM-as-judge, no labeled ground truth required.

Learn Agentic AI

Weekly. Patterns for shipping agents that work — MCP, scorecards, regression tests, prompts, model comparisons.

500+ builders subscribed