Blog/Tags/observability

observability

Browse 17 articles tagged with “observability”.

Articles tagged “observability”

17 articles

Three Diverging Paths Representing the Google, Anthropic, and OpenAI Managed Agent Runtime Architectures

Agent Architecture·13 min read

Managed Agents in 2026: Three Runtimes, Three Trade-Offs

Google, Anthropic, and OpenAI all shipped 'managed agents' in May 2026, and they mean completely different things. Here's what each runtime trades away for CX teams.

Dashboard showing agent resolution costs alongside quality scores and task success rates

Testing & Evaluation·18 min read

Cost Per Successful Outcome: The AI Agent Metric Teams Miss

Most teams measure AI agent quality by pass rate. The metric that actually predicts ROI is cost per successful outcome: what each resolution costs paired against whether it actually resolved. Here's how to build it.

Circular diagram showing the five phases of the agent development lifecycle with arrows connecting each phase

Operations·14 min read

The Agent Development Lifecycle: Ship, Observe, Improve

Shipping an AI agent is easy. Keeping it reliable after launch is where most teams struggle. The ADLC gives you a structured approach: Intent, Build, Evaluate, Deploy, Observe -- and then do it again.

A warm-lit dashboard showing token usage breakdown with a large orange bar labeled 'System Prompt' dominating the chart

Operations·13 min read read

Your agent re-reads its own manual on every call

Datadog's 2026 State of AI Engineering report found that 69% of input tokens go to system prompts, yet only 28% of LLM calls use prompt caching. Here's how to diagnose the problem and fix it without rewriting your agent.

A dashboard showing rich telemetry data on one side and a blank trend chart on the other, representing observability without measurement

Testing & Evaluation·11 min read

Your Agent Has Observability. It Doesn't Have Measurement.

89% of AI teams added observability. 52% added evals. But only 31% can say whether their agent is getting better or worse. Here's the difference between watching your agent and actually measuring it.

Layered audio waveform splitting into three colored tracks with one outlier spike trailing into fog, teal-copper engineering palette

Voice & Conversation·12 min read

Your voice agent's P95 is lying. The real problem is P99.9

Per-stage P95 hides the tail customers feel. How variance compounds across STT, LLM, and TTS, and how to SLO the joint distribution.

AI-Generated Illustration for Handoff Is the New Prompt -- Soul (2020) Style, Terra Cotta Palette

Agent Architecture·11 min read read

Multi-Agent Systems Don't Fail at Reasoning. They Fail at Handoff.

Multi-agent systems don't fail at reasoning. They fail at handoff. Command objects, memory transfer, and the 8-10 handoff cliff, plus the telemetry that catches drift.

Iceberg at Sea With Small Visible Tip Above Dark Water and Enormous Submerged Mass Glowing Amber — Visual Metaphor for Reasoning Tokens Hidden Below the Surface of Agent Responses

Operations·14 min read read

Reasoning Tokens Are Showing Up on the Bill

GPT-5 and Claude thinking tokens bill as output and stay invisible. A 200-token reply can hide 8,000 billable ones. How to measure, cap, and budget.

Architecture diagram of an agentic data layer with event log, signal extraction, entity store, and improvement loop

Agent Architecture·14 min read

The Modern Data Stack Wasn't Built for Agents

Snowflake, dbt, and Fivetran were built for humans asking batch questions. Agents need streaming signals, per-entity memory in under 100ms, and write-back.

Watercolor illustration of two figures walking through a warm corridor of looping paths, Her style in warm plum tones

Testing & Evaluation·9 min read

Every Failed Call Is a Test Case You Haven't Written Yet

The gap between staging and production for AI agents is measured in surprise. Here's how to close the loop from live failure to regression gate.

Control room with green monitoring screens, one cracked display unnoticed in the center, Minority Report style

Testing & Evaluation·14 min read read

Is monitoring your AI agent actually enough?

Research shows 83% of agent teams track capability metrics but only 30% evaluate real outcomes. Here's how to close the gap with multi-turn scenario testing.

Dashboard showing split-screen comparison of offline test results versus live production scorecard trends for an AI agent

Testing & Evaluation·18 min read

Online vs. Offline Evals: Close the Production Gap

89% of teams have observability but only 37% run online evals. Here's why that gap is where production failures hide, and how to close it with a practical online eval pipeline.

Illustration of distributed trace spans connecting an AI agent to MCP tool servers with observability signals flowing through

Technical Guide·20 min read

MCP Servers in Production: Observability from Day One

Instrument your MCP servers with OpenTelemetry for production-grade observability. Covers tracing tool calls, detecting loops, cost attribution, and alerting.

Engineering team reviewing real-time AI agent monitoring dashboards with metrics and conversation traces

Learning AI·22 min read read

Build an AI Agent Observability Pipeline from Scratch

Build a production observability pipeline for AI agents using TypeScript and the Chanl SDK. Covers metrics, traces, quality scoring, drift detection, and alerting.

Watercolor illustration of distributed trace spans flowing through an AI agent pipeline with OpenTelemetry instrumentation

Operations·18 min read read

What to Trace When Your AI Agent Hits Production

OpenTelemetry GenAI conventions are the production standard for agent tracing. What to instrument, what to skip, and what breaks — from a 2 AM debugging war story.

Ilustracion en acuarela de un ingeniero monitoreando un dashboard de agentes de IA en produccion con metricas de confiabilidad

Agent Architecture·24 min read

IA Agentica en Produccion: De Prototipo a Servicio Confiable

Lleva IA agentica a produccion sin que se rompa a las 2 AM. Cubre patrones de orquestacion (ReAct, bucles de planificacion), manejo de errores, circuit breakers, degradacion elegante, observabilidad y escalamiento, con implementaciones en TypeScript que puedes reutilizar.

Watercolor illustration of an engineering team monitoring AI agent dashboards with data flowing across screens

Operations·28 min read read

AI Agent Observability: What to Monitor When Your Agent Goes Live

Build a production observability pipeline for AI agents. Covers latency, token usage, tool success rates, conversation quality, drift detection, structured logging, alerting strategies, and the critical difference between LLM and agent observability.

The Signal Briefing

Un email por semana. Cómo los equipos líderes de CS, ingresos e IA están convirtiendo conversaciones en decisiones. Benchmarks, playbooks y lo que funciona en producción.

500+ líderes de CS e ingresos suscritos