Blog/Operations

Operations

Browse 22 articles in operations.

Operations Articles

22 articles · Page 1 of 2

AI Agent SLO Dashboard Showing Error Budget Burn Rate and Reliability Metrics

Operations·16 min read

SRE for AI Agents: SLOs, Error Budgets, and Reliability

Traditional SRE doesn't catch AI agent failures. Here's a practical SRE playbook for agents: the five SLIs that matter, how to set SLOs that are actually useful, and how error budgets control agent autonomy before problems escalate.

Circular diagram showing the five phases of the agent development lifecycle with arrows connecting each phase

Operations·14 min read

The Agent Development Lifecycle: Ship, Observe, Improve

Shipping an AI agent is easy. Keeping it reliable after launch is where most teams struggle. The ADLC gives you a structured approach: Intent, Build, Evaluate, Deploy, Observe -- and then do it again.

A warm-lit dashboard showing token usage breakdown with a large orange bar labeled 'System Prompt' dominating the chart

Operations·13 min read read

Your agent re-reads its own manual on every call

Datadog's 2026 State of AI Engineering report found that 69% of input tokens go to system prompts, yet only 28% of LLM calls use prompt caching. Here's how to diagnose the problem and fix it without rewriting your agent.

Two parallel agent workflows running side by side, one labeled live and one labeled shadow, with metrics comparison

Operations·13 min read

Shadow Mode: Deploy AI Agent Updates Without Risk

Shadow mode runs your new agent version in parallel with production, comparing behavior before customers ever see it. Here's how to build the full deployment pipeline from shadow to canary to 100%.

A customer at a kitchen counter, phone in hand, gentle window light. A single product card on the screen, a thoughtful pause.

Operations·10 min read

How to Build a Cart Recovery Agent (and When to Send Nothing)

Most cart-recovery flows are three discount emails. A real recovery agent decides why the customer left, picks the right channel, caps the discount to protect margin, and sometimes sends nothing at all. Here is how to build it.

Warehouse worker placing a return box on a conveyor in late afternoon light

Operations·12 min read

Build a Returns Voice Agent That Can't Refund Itself Broke

Returns are 60% of peak ecommerce contact volume. Most voice agents will refund $4,000 on a prompt injection. Here's how to build one that physically can't.

Quiet Morning Kitchen With a Phone Face Up on a Wooden Counter Showing a Single Calm Notification Next to a Coffee Cup in Soft Terra Cotta Light

Operations·11 min read read

Build a Nurture Agent That Decides Not to Send

Most nurture sequences are 14 emails on a calendar. The fix is an event-triggered agent whose most valuable action is wait. Here's the worker.

A young couple at a kitchen table at golden hour, looking at a listing photo together on a phone face-up between them.

Operations·12 min read

How to Build a Real Estate Showing Voice Agent (MLS, Lockboxes, TCPA)

Build a real estate voice agent that pulls live MLS data, parses showing instructions, books tours, and sends lockbox codes at the right time.

Iceberg at Sea With Small Visible Tip Above Dark Water and Enormous Submerged Mass Glowing Amber — Visual Metaphor for Reasoning Tokens Hidden Below the Surface of Agent Responses

Operations·14 min read read

Reasoning Tokens Are Showing Up on the Bill

GPT-5 and Claude thinking tokens bill as output and stay invisible. A 200-token reply can hide 8,000 billable ones. How to measure, cap, and budget.

Watercolor illustration of a split dashboard showing human reviewers on one side and automated scoring metrics on the other

Operations·15 min read read

74% of Production Agents Still Rely on Human Evaluation

A survey of 306 practitioners reveals most production agents are far simpler than expected. The eval gap isn't a tooling problem. It's a trust problem.

Watercolor illustration of distributed trace spans flowing through an AI agent pipeline with OpenTelemetry instrumentation

Operations·18 min read read

What to Trace When Your AI Agent Hits Production

OpenTelemetry GenAI conventions are the production standard for agent tracing. What to instrument, what to skip, and what breaks — from a 2 AM debugging war story.

woman in black long sleeve shirt standing beside woman in gray long sleeve shirt - Photo by Maxime on Unsplash

Operations·12 min read

The AI Agent Dashboard of 2026: What Teams Actually Need to See

Traditional dashboards tell you what went wrong yesterday. The AI agent dashboards teams actually need deliver feedback in the moment, during the call, not after it. Here's what that looks like in practice.

1 2

Learn Agentic AI

Weekly. Patterns for shipping agents that work — MCP, scorecards, regression tests, prompts, model comparisons.

500+ builders subscribed