Blog/Tags/production

production

Browse 30 articles tagged with “production”.

Articles tagged “production”

30 articles

Three Diverging Paths Representing the Google, Anthropic, and OpenAI Managed Agent Runtime Architectures

Agent Architecture·13 min read

Managed Agents in 2026: Three Runtimes, Three Trade-Offs

Google, Anthropic, and OpenAI all shipped 'managed agents' in May 2026, and they mean completely different things. Here's what each runtime trades away for CX teams.

Stateless MCP Servers Behind a Load Balancer With Task Handles Flowing Between Agent and Server

Tools & MCP·16 min read

How to Migrate Your MCP Server to Stateless Mode

The MCP 2026 release candidate makes stateless the recommended default. Your MCP server can now scale behind any load balancer without sticky routing. Here's how to migrate and use the new Tasks extension for async CX work.

Watercolor Illustration of a CI Pipeline With a Behavioral Testing Gate Between Staging and Production

Testing & Evaluation·15 min read

How to Build a Regression Test Suite for AI Agents

Your CI/CD pipeline catches code regressions. But who catches it when a prompt change breaks your agent's compliance behavior? Here's how to build behavioral regression testing for non-deterministic AI agents.

A graph diagram showing agent state transitions with named nodes and typed edges

Agent Architecture·14 min read

Your Agent Is Already a State Machine. Make It Explicit.

Every production AI agent is secretly a state machine. Making it explicit gives you checkpointing, testable paths, and observable state transitions -- without rewriting your agent logic.

Dashboard showing AI agent KPI tiles for task completion rate, escalation rate, cost per successful outcome, and CSAT delta

Testing & Evaluation·13 min read

AI Agent KPIs: What to Measure Before You Ship

Only 31% of teams have a measurement framework for their AI agents. Here's how to define task completion rate, escalation rate, cost per outcome, and CSAT delta before your first production interaction.

Diagram showing an MCP server with OAuth 2.0 token validation, per-tenant tool scoping, and multi-tenant isolation layers

Tools & MCP·15 min read

MCP Auth in Production: Scopes, Tokens, and Tenant Isolation

Most MCP servers ship with no auth. Here's how to add OAuth 2.0 scopes, per-tenant tool sets, and client isolation before your MCP server becomes load-bearing production infrastructure.

AI-generated illustration for ai agent circuit breakers reliability production -- Blade Runner 2049 (2017) style, Terra Cotta palette

Best Practices·15 min read

Circuit Breakers for AI Agents: Stop the 3 AM Meltdown

One retry loop at 11 PM becomes $437 by 7 AM. Here's how to implement circuit breakers for AI agent tool calls, LLM calls, and external APIs, with TypeScript patterns that stop cascading failures before they start.

Grid of test scenario cards with pass and fail indicators showing evaluation coverage distribution

Testing & Evaluation·13 min read

How Much Testing Is Enough for Your AI Agent?

Code coverage doesn't apply to AI agents. Here's a framework for thinking about evaluation coverage: how many scenarios you need, what distribution to target, and how to know when you've tested enough.

Network diagram showing HTTP transport routes consolidating from two paths to one streamlined endpoint

Tools & MCP·12 min read

MCP SSE Is Deprecated. Here's How to Migrate

SSE transport is being deprecated across major MCP platforms in 2026. Here's a practical migration guide from HTTP+SSE to Streamable HTTP, with TypeScript examples and a phased rollout strategy.

Developer at a desk surrounded by sticky notes with warning symbols, red warning lights on a server rack nearby

Tools & MCP·14 min read read

7 FastMCP mistakes that break your agent in production

FastMCP servers that work locally often fail at scale. Seven common mistakes, from missing annotations to monolithic tool sets, and how to fix each one.

Diagram showing MCP as a foundational protocol layer with agent configuration, memory, testing, and observability stacked above it

Tools & MCP·16 min read

MCP Is Now Open Infrastructure: Build for What's Next

MCP was donated to the Linux Foundation and the AAIF just held its first summit. What does the protocol becoming open infrastructure mean for what you build on top of it?

Overhead view of translucent screens on a conference table, their overlapping symbols blurring into noise

Agent Architecture·14 min read read

The 17x error trap in multi-agent systems

Multi-agent systems amplify errors 17x, not reduce them. We compare CrewAI, LangGraph, and Autogen failure modes with concrete fixes and a decision tree.

A clean desk with colorful building blocks arranged into a fragile tower on one side and a sturdy steel structure with monitoring instruments on the other

Industry & Strategy·14 min read read

The no-code ceiling: when agent builders hit production

Visual agent builders get you to 80% fast. The last 20%, telephony, monitoring, testing, and memory, requires infrastructure they never intended to provide.

Dashboard showing split-screen comparison of offline test results versus live production scorecard trends for an AI agent

Testing & Evaluation·18 min read

Online vs. Offline Evals: Close the Production Gap

89% of teams have observability but only 37% run online evals. Here's why that gap is where production failures hide, and how to close it with a practical online eval pipeline.

Illustration of an AI judge holding a checklist while reviewing a conversation transcript on a monitor

Technical Guide·22 min read

LLM-as-a-Judge: Build a Production Eval Pipeline

Build a production LLM-as-a-judge eval pipeline step by step. Covers judge selection, rubric design, CI integration, and sampling strategies that scale.

Illustration of distributed trace spans connecting an AI agent to MCP tool servers with observability signals flowing through

Technical Guide·20 min read

MCP Servers in Production: Observability from Day One

Instrument your MCP servers with OpenTelemetry for production-grade observability. Covers tracing tool calls, detecting loops, cost attribution, and alerting.

Engineering team reviewing real-time AI agent monitoring dashboards with metrics and conversation traces

Learning AI·22 min read read

Build an AI Agent Observability Pipeline from Scratch

Build a production observability pipeline for AI agents using TypeScript and the Chanl SDK. Covers metrics, traces, quality scoring, drift detection, and alerting.

Illustration of a quality monitoring dashboard showing score trends and alert thresholds across production AI agent conversations

Learning AI·20 min read

Production Agent Evals: Catch Score Drift, Ship Confidently

Your evals pass in staging but miss production failures. Build three eval pipelines with the Chanl SDK: automated scorecards, scenario regression, and drift detection that catches quality degradation before customers do.

Watercolor illustration of a split dashboard showing human reviewers on one side and automated scoring metrics on the other

Operations·15 min read read

74% of Production Agents Still Rely on Human Evaluation

A survey of 306 practitioners reveals most production agents are far simpler than expected. The eval gap isn't a tooling problem. It's a trust problem.

Visualization of the widening gap between AI agent capability scores and reliability metrics across model generations

Learning AI·15 min read

Your Agent Is Getting Smarter. It's Not Getting More Reliable.

Reliability improves at half the rate of accuracy. Three 85%+ tools combine to just 74%. Here's the math, the research, and the testing protocols that close the gap.

Data visualization showing the gap between AI agent benchmark scores and production performance metrics

Testing & Evaluation·13 min read

Your Agent Aced the Benchmark. Production Disagreed.

We scored 92% on GAIA. Production CSAT: 64%. Here's which AI agent benchmarks actually predict deployed performance, why most don't, and what to measure instead.

Watercolor illustration of distributed trace spans flowing through an AI agent pipeline with OpenTelemetry instrumentation

Operations·18 min read read

What to Trace When Your AI Agent Hits Production

OpenTelemetry GenAI conventions are the production standard for agent tracing. What to instrument, what to skip, and what breaks — from a 2 AM debugging war story.

Developer comparing small and large AI model outputs on a monitor

Learning AI·18 min read

A 7B Domain Model Beat Everything We Tried

Domain-specific language models are beating trillion-parameter generalists on vertical tasks. Here's when a 7B model is the right call, how the training pipeline works, and what production teams are shipping today.

Diagram showing interconnected AI agents coordinating a complex customer service workflow

Agent Architecture·14 min read

The Multi-Agent Pattern That Actually Works in Production

Gartner reports a 1,445% surge in multi-agent system inquiries. Here are the orchestration patterns that actually work when real customers call -- and why most teams pick the wrong one.

Layered shield diagram representing defense-in-depth security architecture for AI agents

Security & Compliance·18 min read

Your AI Agent Has No Guardrails

Air Canada honored a refund its chatbot hallucinated. DPD's bot cursed at customers on camera. One e-commerce agent approved $2.3M in unauthorized refunds at 2:47 AM. Here is the five-layer guardrail architecture that prevents all three.

Watercolor illustration of descending cost bars alongside token streams flowing through an optimization pipeline

Operations·16 min read read

Your AI Agent Costs $13K/Month. Here's the Fix.

A production customer-service agent burned $13,247 in one month. Prompt caching, model routing, batch processing, and plan-and-execute architecture cut it to $1,100. Real pricing math for every technique.

Watercolor illustration of developers at a cafe terrace with rocket deployment diagram on screen — Dusty Blue style

Learning AI·20 min read

Part 4: All 7 Extension Points in One Production Codebase

50+ skills, multiple MCP servers, scoped rules, safety hooks — here's how all 7 Claude extension points compose in a real NestJS monorepo with 17 projects. What works, what fights, and what we'd do differently.

Developer reviewing AI agent test results on a laptop

Testing & Evaluation·14 min read

Your Agent Passed Every Dev Test. Here's Why It'll Fail in Production

A 4-layer testing framework for AI agents (unit, integration, performance, and chaos testing) so your agent survives real customers, not just controlled demos.

Watercolor illustration of an engineer monitoring a production AI agent dashboard with reliability metrics

Agent Architecture·24 min read

Agentic AI in Production: From Prototype to Reliable Service

Ship agentic AI that doesn't break at 2 AM. Covers orchestration patterns (ReAct, planning loops), error handling, circuit breakers, graceful degradation, observability, and scaling — with TypeScript implementations you can steal.

Watercolor illustration of an engineering team monitoring AI agent dashboards with data flowing across screens

Operations·28 min read read

AI Agent Observability: What to Monitor When Your Agent Goes Live

Build a production observability pipeline for AI agents. Covers latency, token usage, tool success rates, conversation quality, drift detection, structured logging, alerting strategies, and the critical difference between LLM and agent observability.

Learn Agentic AI

Weekly. Patterns for shipping agents that work — MCP, scorecards, regression tests, prompts, model comparisons.

500+ builders subscribed