ChanlChanl

The Chanl Blog

Insights on building, connecting, and monitoring AI agents for customer experience — from the teams shipping them.

All Articles

157 articles · Page 1 of 14

A person standing before multiple transparent evaluation panels in a semicircle, each showing a different lens on the same conversation
Testing & Evaluation·16 min read read

Your LLM-as-judge may be highly biased

LLM-as-Judge has 12 documented biases. Here are 6 evaluation methods production teams actually use instead, with code examples and patterns.

Read More
Developer at a desk surrounded by sticky notes with warning symbols, red warning lights on a server rack nearby
Tools & MCP·14 min read read

7 FastMCP mistakes that break your agent in production

FastMCP servers that work locally often fail at scale. Seven common mistakes, from missing annotations to monolithic tool sets, and how to fix each one.

Read More
An archivist standing in a long corridor between shelves of documents, deciding whether to file or shred
Security & Compliance·14 min read read

GDPR says delete. EU AI Act says keep. Now what?

GDPR requires deletion on request. The EU AI Act requires 10-year audit trails. Here's how to architect agent memory that satisfies both simultaneously.

Read More
Control room with green monitoring screens, one cracked display unnoticed in the center, Minority Report style
Testing & Evaluation·14 min read read

Is monitoring your AI agent actually enough?

Research shows 83% of agent teams track capability metrics but only 30% evaluate real outcomes. Here's how to close the gap with multi-turn scenario testing.

Read More
A massive warehouse of filing cabinets stretching into fog, with one person sitting at a clean desk with three folders under warm lamplight
Agent Architecture·14 min read read

Your MCP server is a monolith. Here's how to fix it

MCP servers dump every tool into the context window, burning tokens before your agent reasons. Four patterns to fix it: decompose, filter, gateway, facade.

Read More
Person examining a translucent board with connected note cards, verifying links between them
Testing & Evaluation·16 min read read

Memory bugs don't crash. They just give wrong answers.

Memory bugs don't crash your agent. They just give subtly wrong answers using stale context. Here are 5 test patterns to catch them before customers do.

Read More
Overhead view of translucent screens on a conference table, their overlapping symbols blurring into noise
Agent Architecture·14 min read read

The 17x error trap in multi-agent systems

Multi-agent systems amplify errors 17x, not reduce them. We compare CrewAI, LangGraph, and Autogen failure modes with concrete fixes and a decision tree.

Read More
A clean desk with colorful building blocks arranged into a fragile tower on one side and a sturdy steel structure with monitoring instruments on the other
Industry & Strategy·14 min read read

The no-code ceiling: when agent builders hit production

Visual agent builders get you to 80% fast. The last 20%, telephony, monitoring, testing, and memory, requires infrastructure they never intended to provide.

Read More
An engineer at a wide desk with two monitors showing warm and cool waveform visualizations, a headset between the screens, amber cityscape through floor-to-ceiling windows
Voice & Conversation·14 min read read

Pipecat vs LiveKit: the trade-offs that lock you in

An opinionated comparison of Pipecat and LiveKit for production voice agents, covering architecture, deployment, cost, and the trade-offs that lock you in.

Read More
Person connecting protocol cables between two glowing devices with diagrams on a whiteboard
Learning AI·22 min read

Build the MCP + A2A agent protocol stack from scratch

Wire an MCP server to an A2A agent that delegates tasks and calls tools. TypeScript and Python examples, Streamable HTTP transport, Agent Cards, and auth.

Read More
Person sorting through stacks of documents, crossing out wrong ones, with a magnifying glass on the desk
Learning AI·22 min read

Agentic RAG: from dumb retrieval to self-correcting agents

Your RAG pipeline retrieves wrong documents and nobody catches it. Build a self-correcting agent that grades results, rewrites queries, and knows when to stop.

Read More
Open-source AI agent testing engine with conversation simulation and scorecard evaluation
Testing & Evaluation·14 min read

We open-sourced our AI agent testing engine

chanl-eval is an open-source engine for stress-testing AI agents with simulated conversations, adaptive personas, and per-criteria scorecards. MIT licensed.

Read More

Learn Agentic AI

One lesson a week — practical techniques for building, testing, and shipping AI agents. From prompt engineering to production monitoring. Learn by doing.

500+ engineers subscribed