The Chanl Blog
Insights on building, connecting, and monitoring AI agents for customer experience — from the teams shipping them.
Latest Articles
Testing & EvaluationYour LLM-as-judge may be highly biased
LLM-as-Judge has 12 documented biases. Here are 6 evaluation methods production teams actually use instead, with code examples and patterns.
Tools & MCP7 FastMCP mistakes that break your agent in production
FastMCP servers that work locally often fail at scale. Seven common mistakes, from missing annotations to monolithic tool sets, and how to fix each one.
All Articles
157 articles · Page 1 of 14

Your LLM-as-judge may be highly biased
LLM-as-Judge has 12 documented biases. Here are 6 evaluation methods production teams actually use instead, with code examples and patterns.

7 FastMCP mistakes that break your agent in production
FastMCP servers that work locally often fail at scale. Seven common mistakes, from missing annotations to monolithic tool sets, and how to fix each one.

GDPR says delete. EU AI Act says keep. Now what?
GDPR requires deletion on request. The EU AI Act requires 10-year audit trails. Here's how to architect agent memory that satisfies both simultaneously.

Is monitoring your AI agent actually enough?
Research shows 83% of agent teams track capability metrics but only 30% evaluate real outcomes. Here's how to close the gap with multi-turn scenario testing.

Your MCP server is a monolith. Here's how to fix it
MCP servers dump every tool into the context window, burning tokens before your agent reasons. Four patterns to fix it: decompose, filter, gateway, facade.

Memory bugs don't crash. They just give wrong answers.
Memory bugs don't crash your agent. They just give subtly wrong answers using stale context. Here are 5 test patterns to catch them before customers do.

The 17x error trap in multi-agent systems
Multi-agent systems amplify errors 17x, not reduce them. We compare CrewAI, LangGraph, and Autogen failure modes with concrete fixes and a decision tree.

The no-code ceiling: when agent builders hit production
Visual agent builders get you to 80% fast. The last 20%, telephony, monitoring, testing, and memory, requires infrastructure they never intended to provide.

Pipecat vs LiveKit: the trade-offs that lock you in
An opinionated comparison of Pipecat and LiveKit for production voice agents, covering architecture, deployment, cost, and the trade-offs that lock you in.

Build the MCP + A2A agent protocol stack from scratch
Wire an MCP server to an A2A agent that delegates tasks and calls tools. TypeScript and Python examples, Streamable HTTP transport, Agent Cards, and auth.

Agentic RAG: from dumb retrieval to self-correcting agents
Your RAG pipeline retrieves wrong documents and nobody catches it. Build a self-correcting agent that grades results, rewrites queries, and knows when to stop.

We open-sourced our AI agent testing engine
chanl-eval is an open-source engine for stress-testing AI agents with simulated conversations, adaptive personas, and per-criteria scorecards. MIT licensed.
Learn Agentic AI
One lesson a week — practical techniques for building, testing, and shipping AI agents. From prompt engineering to production monitoring. Learn by doing.