The Chanl Blog

Insights on building, connecting, and monitoring AI agents for customer experience — from the teams shipping them.

All Articles

235 articles · Page 7 of 20

Grid of test scenario cards with pass and fail indicators showing evaluation coverage distribution

How Much Testing Is Enough for Your AI Agent?

Code coverage doesn't apply to AI agents. Here's a framework for thinking about evaluation coverage: how many scenarios you need, what distribution to target, and how to know when you've tested enough.

Network diagram showing HTTP transport routes consolidating from two paths to one streamlined endpoint

Tools & MCP·12 min read

MCP SSE Is Deprecated. Here's How to Migrate

SSE transport is being deprecated across major MCP platforms in 2026. Here's a practical migration guide from HTTP+SSE to Streamable HTTP, with TypeScript examples and a phased rollout strategy.

A person standing before multiple transparent evaluation panels in a semicircle, each showing a different lens on the same conversation

Testing & Evaluation·16 min read read

Your LLM-as-judge may be highly biased

LLM-as-Judge has 12 documented biases. Here are 6 evaluation methods production teams actually use instead, with code examples and patterns.

Developer at a desk surrounded by sticky notes with warning symbols, red warning lights on a server rack nearby

Tools & MCP·14 min read read

7 FastMCP mistakes that break your agent in production

FastMCP servers that work locally often fail at scale. Seven common mistakes, from missing annotations to monolithic tool sets, and how to fix each one.

An archivist standing in a long corridor between shelves of documents, deciding whether to file or shred

Security & Compliance·14 min read read

GDPR says delete. EU AI Act says keep. Now what?

GDPR requires deletion on request. The EU AI Act requires 10-year audit trails. Here's how to architect agent memory that satisfies both simultaneously.

Control room with green monitoring screens, one cracked display unnoticed in the center, Minority Report style

Testing & Evaluation·14 min read read

Is monitoring your AI agent actually enough?

Research shows 83% of agent teams track capability metrics but only 30% evaluate real outcomes. Here's how to close the gap with multi-turn scenario testing.

Diagram showing MCP as a foundational protocol layer with agent configuration, memory, testing, and observability stacked above it

Tools & MCP·16 min read

MCP Is Now Open Infrastructure: Build for What's Next

MCP was donated to the Linux Foundation and the AAIF just held its first summit. What does the protocol becoming open infrastructure mean for what you build on top of it?

A massive warehouse of filing cabinets stretching into fog, with one person sitting at a clean desk with three folders under warm lamplight

Agent Architecture·14 min read read

Your MCP server is a monolith. Here's how to fix it

MCP servers dump every tool into the context window, burning tokens before your agent reasons. Four patterns to fix it: decompose, filter, gateway, facade.

Person examining a translucent board with connected note cards, verifying links between them

Testing & Evaluation·16 min read read

Memory bugs don't crash. They just give wrong answers.

Memory bugs don't crash your agent. They just give subtly wrong answers using stale context. Here are 5 test patterns to catch them before customers do.

Overhead view of translucent screens on a conference table, their overlapping symbols blurring into noise

Agent Architecture·14 min read read

The 17x error trap in multi-agent systems

Multi-agent systems amplify errors 17x, not reduce them. We compare CrewAI, LangGraph, and Autogen failure modes with concrete fixes and a decision tree.

A clean desk with colorful building blocks arranged into a fragile tower on one side and a sturdy steel structure with monitoring instruments on the other

Industry & Strategy·14 min read read

The no-code ceiling: when agent builders hit production

Visual agent builders get you to 80% fast. The last 20%, telephony, monitoring, testing, and memory, requires infrastructure they never intended to provide.

Dashboard showing split-screen comparison of offline test results versus live production scorecard trends for an AI agent

Testing & Evaluation·18 min read

Online vs. Offline Evals: Close the Production Gap

89% of teams have observability but only 37% run online evals. Here's why that gap is where production failures hide, and how to close it with a practical online eval pipeline.

1...6 7 8...20

Learn Agentic AI

Weekly. Patterns for shipping agents that work — MCP, scorecards, regression tests, prompts, model comparisons.

500+ builders subscribed