ChanlChanl

The Chanl Blog

Insights on building, connecting, and monitoring AI agents for customer experience — from the teams shipping them.

All Articles

215 articles · Page 2 of 18

A dashboard showing rich telemetry data on one side and a blank trend chart on the other, representing observability without measurement
Testing & Evaluation·11 min read

Your Agent Has Observability. It Doesn't Have Measurement.

89% of AI teams added observability. 52% added evals. But only 31% can say whether their agent is getting better or worse. Here's the difference between watching your agent and actually measuring it.

Read More
A timeline showing a completed conversation on the left and failed downstream tasks on the right, with a gap between them
Agent Architecture·13 min read

Why CX Agents Fail Between Conversations

Your AI agent handles the call perfectly and still fails your customer. The problem isn't the conversation -- it's everything that happens after it. Here's how async task queues fix the gap.

Read More
Dashboard showing AI agent KPI tiles for task completion rate, escalation rate, cost per successful outcome, and CSAT delta
Testing & Evaluation·13 min read

AI Agent KPIs: What to Measure Before You Ship

Only 31% of teams have a measurement framework for their AI agents. Here's how to define task completion rate, escalation rate, cost per outcome, and CSAT delta before your first production interaction.

Read More
Diagram showing an MCP server with OAuth 2.0 token validation, per-tenant tool scoping, and multi-tenant isolation layers
Tools & MCP·15 min read

MCP Auth in Production: Scopes, Tokens, and Tenant Isolation

Most MCP servers ship with no auth. Here's how to add OAuth 2.0 scopes, per-tenant tool sets, and client isolation before your MCP server becomes load-bearing production infrastructure.

Read More
AI-generated illustration for ai agent circuit breakers reliability production -- Blade Runner 2049 (2017) style, Terra Cotta palette
Best Practices·15 min read

Circuit Breakers for AI Agents: Stop the 3 AM Meltdown

One retry loop at 11 PM becomes $437 by 7 AM. Here's how to implement circuit breakers for AI agent tool calls, LLM calls, and external APIs, with TypeScript patterns that stop cascading failures before they start.

Read More
Developer console with a grid of tool tiles fading out as a routing accuracy curve declines past tool 50
Tools & MCP·10 min read

Past 50 tools, function-calling accuracy falls off a cliff

Past 50 tools, function-calling accuracy falls off a cliff. Measure the curve on your own agent and recover accuracy with per-turn toolset scoping.

Read More
Three glowing rubric cards floating in misted air, each marking the same transcript with subtly different ink colors, with a faint kappa heatmap projected on the wall behind them
Testing & Evaluation·11 min read

GPT-5, Claude 4.5, Gemini Score the Same Calls. Their Kappa Is 0.52

Run the same calls through GPT-5, Claude 4.5, and Gemini and Cohen's kappa lands at 0.52. Here is how to measure judge agreement on your own corpus.

Read More
AI-generated illustration for long context vs rag cx agents -- Soul (2020) style, Terra Cotta palette
Technical Guide·17 min read

1M-Token Context or RAG? How to Pick for Your CX Agent

Gemini's 1M-token window is real but not free. A practical decision framework for choosing between long-context and RAG for customer experience agents, with cost numbers, code, and the hybrid pattern most production teams land on.

Read More
A glass card hovers in warm plum light with a faint duplicate offset behind it, an agent's pointer landing a few degrees off the intended target
Tools & MCP·11 min read read

MCP tool description drift: the silent failure nobody alerts on

Edit an MCP tool description for clarity, lose 8% routing accuracy, and the eval suite stays green. How to detect, gate, and roll back the drift.

Read More
Layered audio waveform splitting into three colored tracks with one outlier spike trailing into fog, teal-copper engineering palette
Voice & Conversation·12 min read

Your voice agent's P95 is lying. The real problem is P99.9

Per-stage P95 hides the tail customers feel. How variance compounds across STT, LLM, and TTS, and how to SLO the joint distribution.

Read More
AI-generated illustration for agent eval no ground truth -- Soul (2020) style, Terra Cotta palette
Testing & Evaluation·14 min read

How to Eval Agents When There's No Right Answer

Most eval methods assume you know the correct response. CX agents rarely have one. Here's how to score agent quality with criteria-based rubrics and LLM-as-judge, no labeled ground truth required.

Read More
AI-generated illustration for mcp progressive tool discovery -- Her (2013) style, Terra Cotta palette
Tools & MCP·13 min read

Stop Loading All Your MCP Tools at Once

Loading 50 MCP tools burns 72K tokens before your agent says a word. Progressive tool discovery fixes that: smaller context, sharper decisions, real code patterns.

Read More

The Signal Briefing

One email a week. How leading CS, revenue, and AI teams are turning conversations into decisions. Benchmarks, playbooks, and what's working in production.

500+ CS and revenue leaders subscribed