Blog/Tags/quality

quality

Browse 6 articles tagged with “quality”.

Articles tagged “quality”

6 articles

A flowchart showing an agent's step-by-step decision path with one step flagged as diverging from the expected trajectory

Testing & Evaluation·13 min read

Trajectory Eval: Catch Agent Bugs Output Scoring Misses

Final-output scoring misses 20-40% of agent regressions. Trajectory evaluation scores every step an agent takes -- tool calls, reasoning decisions, order of operations -- and catches the bugs that output-only evals can't see.

Watercolor illustration of an observation tower overlooking two parallel worlds, Blade Runner 2049 style in sage and olive tones

Testing & Evaluation·8 min read

Is AI Better Than Your Humans? Score Both on One Rubric

Most teams can't say whether AI beats humans because they score them differently. One rubric, run on both, sliced by segment, gives you an honest answer.

Grid of test scenario cards with pass and fail indicators showing evaluation coverage distribution

Testing & Evaluation·13 min read

How Much Testing Is Enough for Your AI Agent?

Code coverage doesn't apply to AI agents. Here's a framework for thinking about evaluation coverage: how many scenarios you need, what distribution to target, and how to know when you've tested enough.

Aerial view of a modern enterprise operations center with rows of monitors displaying conversation analytics dashboards and quality metrics

Industry & Strategy·15 min read

Your Call Center Handles 10,000 Calls a Day. Who's Grading Them?

AI agents handle 40% of your calls. Your QA team samples 2%. The monitoring gap between deployment and quality is where enterprise reputations break.

Person examining documents through a magnifying glass

Knowledge & Memory·7 min read

Your RAG Returns Wrong Answers. Upgrading the Model Won't Help

Most RAG quality problems are retrieval problems, not model problems. Bad chunking, wrong embeddings, and missing re-ranking cause more hallucinations than model capability gaps.

Professional team analyzing voice AI deployment data on multiple screens showing failure metrics and success patterns

Testing & Evaluation·17 min read

The Voice AI Quality Crisis: Why Most Deployments Fail in Production

Most voice AI deployments fail in production despite passing lab tests. Real data on why the gap exists, what it costs, and how to close it.

Learn Agentic AI

Weekly. Patterns for shipping agents that work — MCP, scorecards, regression tests, prompts, model comparisons.

500+ builders subscribed