The Chanl Blog
Insights on building, connecting, and monitoring AI agents for customer experience — from the teams shipping them.
All Articles
235 articles · Page 9 of 20

MCP vs A2A: Tools Protocol, Agents Protocol, and Why You Need Both
MCP connects agents to tools. A2A connects agents to each other. Most developers confuse them. This guide breaks down both protocols with architecture diagrams, real code, and a decision framework for production systems.

Production Agent Evals: Catch Score Drift, Ship Confidently
Your evals pass in staging but miss production failures. Build three eval pipelines with the Chanl SDK: automated scorecards, scenario regression, and drift detection that catches quality degradation before customers do.

Agent Drift: Why Your AI Gets Worse the Longer It Runs
AI agents silently degrade over long conversations. Research quantifies three types of drift and shows why point-in-time evals miss them entirely.

How to enforce the orchestrator pattern in Claude Code
The main Claude Code thread plans and reviews. Subagents implement. Three enforcement layers make this mandatory: CLAUDE.md, skills, and hooks. Includes a starter kit you can copy.

Banks Trust AI With Transactions. Why Not Customer Calls?
How a mid-size bank deploys AI agents for customer service with identity verification, PCI compliance, fraud detection, and regulatory scorecards.

Your Call Center Handles 10,000 Calls a Day. Who's Grading Them?
AI agents handle 40% of your calls. Your QA team samples 2%. The monitoring gap between deployment and quality is where enterprise reputations break.

The Shopping Assistant That Outsells Your Best Sales Rep
How a $50M fashion retailer turned 15,000 SKUs and customer purchase history into an AI shopping assistant that outsells human sales reps.

The Insurance Agent That Never Misquotes a Policy
How regional insurers deploy AI agents that answer policy questions accurately, intake claims end-to-end, and produce the audit trail regulators demand.

12 Ways Your LLM Judge Is Lying to You
Research identifies 12 systematic biases in LLM-as-a-judge systems. Learn to detect and mitigate each one before they corrupt your eval pipeline.

Your Agent Completed the Task. It Also Forgot 87% of What It Knew.
Task completion hides a silent failure: agents forget 87% of stored knowledge under complexity. New research reveals why standard evals miss this entirely.

74% of Production Agents Still Rely on Human Evaluation
A survey of 306 practitioners reveals most production agents are far simpler than expected. The eval gap isn't a tooling problem. It's a trust problem.

NIST Red-Teamed 13 Frontier Models. All of Them Failed.
NIST ran 250K+ attacks against every frontier model. None survived. Here's what the results mean for teams shipping AI agents to production today.
Learn Agentic AI
Weekly. Patterns for shipping agents that work — MCP, scorecards, regression tests, prompts, model comparisons.