Blog/Tags/cost-optimization

cost-optimization

Browse 6 articles tagged with “cost-optimization”.

Articles tagged “cost-optimization”

6 articles

Watercolor Still-Life of a Steel Coin, Silver Disc, and Gold Token Spilling From a Velvet Pouch Onto Dark Wood — Three Cheap-Tier Models on the Table

Agent Architecture·14 min read read

Everyone Benchmarks Opus. Your Chatbot Runs on Haiku.

Haiku 4.5, GPT-5 Mini, Gemini Flash at the $1/MTok tier that powers CX. Tool-call accuracy, first-token latency, structured-output reliability, blended cost math.

Three Routed Paths Splitting From a Single Customer Message, Each Labeled With a Different AI Model Tier

Agent Architecture·13 min read read

Your Agent Should Use Three Models, Not One

Production CX agents route tasks by difficulty, not brand loyalty. The planner/router/summarizer pattern, a concrete rubric, support-deflection cost math, and the failure modes nobody warns you about.

Iceberg at Sea With Small Visible Tip Above Dark Water and Enormous Submerged Mass Glowing Amber — Visual Metaphor for Reasoning Tokens Hidden Below the Surface of Agent Responses

Operations·14 min read read

Reasoning Tokens Are Showing Up on the Bill

GPT-5 and Claude thinking tokens bill as output and stay invisible. A 200-token reply can hide 8,000 billable ones. How to measure, cap, and budget.

Neural network distillation visualization showing a large teacher model transferring knowledge to a compact student model

Learning AI·16 min read

A 1B Model Just Matched the 70B. Here's How.

How to distill frontier LLMs into small, cheap models that retain 98% accuracy on agent tasks. The teacher-student pattern, NVIDIA's data flywheel, and the Plan-and-Execute architecture that cuts agent costs by 90%.

Small chip outperforming a rack of servers

Learning AI·14 min read

Why Your AI Bill Is 30x Too High

Small language models match GPT-3.5 at 2% of the size and 95% less cost. Benchmarks, code, and a migration story from $13K/month to $400.

Watercolor illustration of descending cost bars alongside token streams flowing through an optimization pipeline

Operations·16 min read read

Your AI Agent Costs $13K/Month. Here's the Fix.

A production customer-service agent burned $13,247 in one month. Prompt caching, model routing, batch processing, and plan-and-execute architecture cut it to $1,100. Real pricing math for every technique.

The Signal Briefing

Un email por semana. Cómo los equipos líderes de CS, ingresos e IA están convirtiendo conversaciones en decisiones. Benchmarks, playbooks y lo que funciona en producción.

500+ líderes de CS e ingresos suscritos