Articles tagged “agent-testing”
2 articles

Testing & Evaluation·12 min read
How to Build a Trajectory Eval for Your AI Agent
Outcome evals check the final answer. Trajectory evals check the path: tools called, data touched, steps taken. Here's how to build one for a CX agent.
Read More

Testing & Evaluation·13 min read
Trajectory Eval: Catch Agent Bugs Output Scoring Misses
Final-output scoring misses 20-40% of agent regressions. Trajectory evaluation scores every step an agent takes -- tool calls, reasoning decisions, order of operations -- and catches the bugs that output-only evals can't see.
Read More
The Signal Briefing
Un email por semana. Cómo los equipos líderes de CS, ingresos e IA están convirtiendo conversaciones en decisiones. Benchmarks, playbooks y lo que funciona en producción.