Articles tagged “latency”
12 articles

Pre-Execute Tool Calls to Cut Agent Latency 48%
Sequential tool calls quietly kill your agent's response time. PASTE shows you can pre-execute likely tool calls during LLM thinking time and cut latency 48% without touching your model.

Your voice agent's P95 is lying. The real problem is P99.9
Per-stage P95 hides the tail customers feel. How variance compounds across STT, LLM, and TTS, and how to SLO the joint distribution.

Everyone Benchmarks Opus. Your Chatbot Runs on Haiku.
Haiku 4.5, GPT-5 Mini, Gemini Flash at the $1/MTok tier that powers CX. Tool-call accuracy, first-token latency, structured-output reliability, blended cost math.

Pipecat vs LiveKit: the trade-offs that lock you in
An opinionated comparison of Pipecat and LiveKit for production voice agents, covering architecture, deployment, cost, and the trade-offs that lock you in.

Voice AI pipeline: STT, LLM, TTS and the 300ms budget
Build a real-time voice pipeline with Pipecat. How STT, LLM, and TTS stream concurrently under a 300ms latency budget, with turn detection and interruptions.

The Buffering Bug That Quietly Breaks Voice Agent Latency
SSE streams fine locally, then tokens batch into 500ms bursts in production. Here's why, how to fix it, and why pipeline parallelism matters more than model speed.

Voice Agent Platform Architecture: The Stack Behind Sub-300ms Responses
Deep dive into voice agent architecture — the STT→LLM→TTS pipeline, latency budgets, interruption handling, WebRTC vs WebSocket transport, and what orchestration platforms leave on the table.

Real-Time Monitoring for AI Agents: What to Watch and When to Panic
What dashboards actually matter for production AI agents. Alert fatigue, anomaly detection, and the metrics that predict failures before customers notice.

Edge AI for Voice Agents: Fix Latency and Privacy at the Source
How edge AI eliminates 50-200ms of latency and entire classes of privacy risks for voice agents — with hybrid architecture patterns and TypeScript examples.

Sub-300ms Voice AI: The New Standard That's Redefining Customer Expectations
Discover why sub-300ms response times have become the new standard in voice AI, backed by cognitive science research and real-world deployment data.

Performance Benchmarks for AI Agents: What Actually Matters Beyond Word Error Rate
Most enterprises obsess over Word Error Rate while missing the metrics that actually predict success. Here's what to measure instead.

Why Voice AI Latency Past One Second Tanks Satisfaction
Each second of voice AI latency measurably erodes customer satisfaction. Here's how to measure, budget, and cut delay across the ASR, LLM, and TTS pipeline.
Learn Agentic AI
Weekly. Patterns for shipping agents that work — MCP, scorecards, regression tests, prompts, model comparisons.