The Chanl Blog
Insights on building, connecting, and monitoring AI agents for customer experience — from the teams shipping them.
All Articles
215 articles · Page 4 of 18

When Your Customer's Browser Agent Resolves the Ticket Before Yours Picks Up
Browser Use, Operator, Mariner, Computer Use. Here's what happens to CX volumes when the customer's browser agent resolves the ticket before yours picks up.

Your CX Agent Doesn't Care Who Won SWE-Bench. Here's Who Actually Wins.
SWE-bench crowns a coding king. Customer experience agents answer to a different benchmark, tau-bench, and the rankings flip. The head-to-head that actually predicts production reliability.

Multi-Agent Systems Don't Fail at Reasoning. They Fail at Handoff.
Multi-agent systems don't fail at reasoning. They fail at handoff. Command objects, memory transfer, and the 8-10 handoff cliff, plus the telemetry that catches drift.

Everyone Benchmarks Opus. Your Chatbot Runs on Haiku.
Haiku 4.5, GPT-5 Mini, Gemini Flash at the $1/MTok tier that powers CX. Tool-call accuracy, first-token latency, structured-output reliability, blended cost math.

Your Agent Should Use Three Models, Not One
Production CX agents route tasks by difficulty, not brand loyalty. The planner/router/summarizer pattern, a concrete rubric, support-deflection cost math, and the failure modes nobody warns you about.

Reasoning Tokens Are Showing Up on the Bill
GPT-5 and Claude thinking tokens bill as output and stay invisible. A 200-token reply can hide 8,000 billable ones. How to measure, cap, and budget.

When to Use a Supervisor, When to Let Agents Swarm
Supervisor burns 20-40% more tokens per run. Swarm hits a quality cliff past 8-10 handoffs. Start supervisor, graduate to swarm when latency bites.

Stop Using SWE-Bench to Pick Your CX Model
SWE-Bench scores 85% or 23% depending on the harness, and neither measures customer experience. Why tau-bench, tau2-bench, and pass^k matter for CX agents.

The Modern Data Stack Wasn't Built for Agents
Snowflake, dbt, and Fivetran were built for humans asking batch questions. Agents need streaming signals, per-entity memory in under 100ms, and write-back.

Correlation Killed Your Retention Model. Causal AI Fixes It.
Your churn model says support calls cause retention. They don't. Build a causal pipeline with DoWhy, EconML, and propensity matching in Python.

Stop Storing Transcripts. Start Modeling Signals.
A JSON blob of transcripts works at 1k calls and collapses at 50k. Design a Signal schema with entity/event split, confidence, provenance, and versioning.

Every Conversation Is an Experiment You Didn't Run
Your agent already ran the A/B test you're scoping. Here's how to read the results in your logs with propensity matching, synthetic control, and diff-in-diff.
The Signal Briefing
One email a week. How leading CS, revenue, and AI teams are turning conversations into decisions. Benchmarks, playbooks, and what's working in production.