The Chanl Blog
Insights on building, connecting, and monitoring AI agents for customer experience — from the teams shipping them.
All Articles
215 articles · Page 2 of 18

Your Agent Has Observability. It Doesn't Have Measurement.
89% of AI teams added observability. 52% added evals. But only 31% can say whether their agent is getting better or worse. Here's the difference between watching your agent and actually measuring it.

Why CX Agents Fail Between Conversations
Your AI agent handles the call perfectly and still fails your customer. The problem isn't the conversation -- it's everything that happens after it. Here's how async task queues fix the gap.

AI Agent KPIs: What to Measure Before You Ship
Only 31% of teams have a measurement framework for their AI agents. Here's how to define task completion rate, escalation rate, cost per outcome, and CSAT delta before your first production interaction.

MCP Auth in Production: Scopes, Tokens, and Tenant Isolation
Most MCP servers ship with no auth. Here's how to add OAuth 2.0 scopes, per-tenant tool sets, and client isolation before your MCP server becomes load-bearing production infrastructure.

Circuit Breakers for AI Agents: Stop the 3 AM Meltdown
One retry loop at 11 PM becomes $437 by 7 AM. Here's how to implement circuit breakers for AI agent tool calls, LLM calls, and external APIs, with TypeScript patterns that stop cascading failures before they start.

Past 50 tools, function-calling accuracy falls off a cliff
Past 50 tools, function-calling accuracy falls off a cliff. Measure the curve on your own agent and recover accuracy with per-turn toolset scoping.

GPT-5, Claude 4.5, Gemini Score the Same Calls. Their Kappa Is 0.52
Run the same calls through GPT-5, Claude 4.5, and Gemini and Cohen's kappa lands at 0.52. Here is how to measure judge agreement on your own corpus.

1M-Token Context or RAG? How to Pick for Your CX Agent
Gemini's 1M-token window is real but not free. A practical decision framework for choosing between long-context and RAG for customer experience agents, with cost numbers, code, and the hybrid pattern most production teams land on.

MCP tool description drift: the silent failure nobody alerts on
Edit an MCP tool description for clarity, lose 8% routing accuracy, and the eval suite stays green. How to detect, gate, and roll back the drift.

Your voice agent's P95 is lying. The real problem is P99.9
Per-stage P95 hides the tail customers feel. How variance compounds across STT, LLM, and TTS, and how to SLO the joint distribution.

How to Eval Agents When There's No Right Answer
Most eval methods assume you know the correct response. CX agents rarely have one. Here's how to score agent quality with criteria-based rubrics and LLM-as-judge, no labeled ground truth required.

Stop Loading All Your MCP Tools at Once
Loading 50 MCP tools burns 72K tokens before your agent says a word. Progressive tool discovery fixes that: smaller context, sharper decisions, real code patterns.
The Signal Briefing
One email a week. How leading CS, revenue, and AI teams are turning conversations into decisions. Benchmarks, playbooks, and what's working in production.