Blog/Tags/llm

llm

Browse 17 articles tagged with “llm”.

Articles tagged “llm”

17 articles

Watercolor Still-Life of a Steel Coin, Silver Disc, and Gold Token Spilling From a Velvet Pouch Onto Dark Wood — Three Cheap-Tier Models on the Table

Agent Architecture·14 min read read

Everyone Benchmarks Opus. Your Chatbot Runs on Haiku.

Haiku 4.5, GPT-5 Mini, Gemini Flash at the $1/MTok tier that powers CX. Tool-call accuracy, first-token latency, structured-output reliability, blended cost math.

Illustration of a balance scale tilted by invisible weights, representing hidden biases in AI evaluation systems

Learning AI·18 min read

12 Ways Your LLM Judge Is Lying to You

Research identifies 12 systematic biases in LLM-as-a-judge systems. Learn to detect and mitigate each one before they corrupt your eval pipeline.

Developer comparing small and large AI model outputs on a monitor

Learning AI·18 min read

A 7B Domain Model Beat Everything We Tried

Domain-specific language models are beating trillion-parameter generalists on vertical tasks. Here's when a 7B model is the right call, how the training pipeline works, and what production teams are shipping today.

Illustration of a neural network with low-rank adapter matrices injected between layers, showing only a small percentage of parameters highlighted for training

Learning AI·19 min read

Fine-Tune a 7B Model for $1,500 (Not $50,000)

Full fine-tuning costs $50K in H100s. QLoRA on an RTX 4090 costs $1,500. Learn how LoRA and QLoRA let you train only 0.1-1% of parameters with nearly identical results, with working code for fine-tuning models that understand your agent's tool schemas.

Neural network distillation visualization showing a large teacher model transferring knowledge to a compact student model

Learning AI·16 min read

A 1B Model Just Matched the 70B. Here's How.

How to distill frontier LLMs into small, cheap models that retain 98% accuracy on agent tasks. The teacher-student pattern, NVIDIA's data flywheel, and the Plan-and-Execute architecture that cuts agent costs by 90%.

Small chip outperforming a rack of servers

Learning AI·14 min read

Why Your AI Bill Is 30x Too High

Small language models match GPT-3.5 at 2% of the size and 95% less cost. Benchmarks, code, and a migration story from $13K/month to $400.

Watercolor illustration of descending cost bars alongside token streams flowing through an optimization pipeline

Operations·16 min read read

Your AI Agent Costs $13K/Month. Here's the Fix.

A production customer-service agent burned $13,247 in one month. Prompt caching, model routing, batch processing, and plan-and-execute architecture cut it to $1,100. Real pricing math for every technique.

Dos hombres filmando una escena al aire libre con obras de arte. - Foto por Luke Thornton en Unsplash

Testing & Evaluation·12 min read

Zero-Shot o sin oportunidad? Como los agentes de IA manejan llamadas que nunca han visto

Cuando un cliente llama con una solicitud que tu agente de IA nunca ha encontrado, que pasa realmente? Desglosamos la mecanica del manejo zero-shot y como probarlo antes de que falle en produccion.

Claude AI agent development tools with code on a developer workspace

Agent Architecture·20 min read read

Claude 4.6 broke our production agent in two hours — here's what's worth the migration

A practical developer guide to Claude 4.6 — adaptive thinking, 1M context, compaction API, tool search, and structured outputs. Real code examples in TypeScript and Python for building production AI agents.

Enfoque selectivo de un cuadricoptero blanco y negro - Foto de Kenny Eliason en Unsplash

Agent Architecture·7 min read

IA Conversacional vs. IA Agentiva: Cual es la diferencia y por que importa para equipos de CX

La IA conversacional sigue scripts. La IA agentiva persigue objetivos. Aqui esta la diferencia exacta, con una comparacion lado a lado y una guia practica para elegir el enfoque correcto para experiencia del cliente.

Watercolor illustration of converging streams representing voice, vision, and text flowing into an AI agent system

Agent Architecture·28 min read read

Multimodal AI Agents: Voice, Vision, and Text in Production

How to architect multimodal AI agents that process voice, vision, and text simultaneously — from STT→LLM→TTS pipelines to vision integration, latency budgets, and production fusion strategies.

Developer comparing two approaches on a whiteboard

Knowledge & Memory·20 min read

Fine-tuning vs RAG: why most teams pick wrong and how to decide

When to fine-tune, when to use RAG, and when you need both — with hands-on LoRA fine-tuning and RAG implementation on the same task to show the difference.

Ilustracion de dos personas revisando un grafico de mejoras juntas en un escritorio de pie

Learning AI·20 min read

Como evaluar agentes de IA: construye un framework de evaluacion desde cero

Construye un framework funcional de evaluacion de agentes de IA en TypeScript y Python. Cubre LLM-as-judge, puntuacion por rubrica, pruebas de regresion e integracion con CI.

Ilustración de una persona escribiendo pensativamente en un escritorio con notas adhesivas y una lámpara cálida

Learning AI·25 min read

Prompt Engineering desde Primeros Principios: 12 Técnicas que Todo Desarrollador de IA Necesita

Domina 12 técnicas esenciales de prompt engineering con ejemplos reales en TypeScript. Desde zero-shot hasta ReAct, construye mejores agentes de IA desde primeros principios.

man in blue dress shirt sitting on black office rolling chair - Photo by David Schultz on Unsplash

Agent Architecture·22 min read

How Multimodal Voice AI Works: From Audio-Only to Vision-Aware Agents

How multimodal voice AI combines speech, vision, and text into a single agent — architecture patterns, latency tradeoffs, and TypeScript code you can run.

text - Photo by Artur Shamsutdinov on Unsplash

Agent Architecture·16 min read

How LLMs Changed Agent Training Forever: From Writing Rules to Writing Prompts

LLMs didn't just improve agent training. They changed the entire discipline. Here's what actually shifted, what works in production, and what the industry still gets wrong.

a man and a woman standing in front of a whiteboard - Photo by Walls.io on Unsplash

Knowledge & Memory·16 min read

Prompt engineering vs. context engineering: What's the next step for voice AI?

While prompt engineering focuses on perfecting inputs, context engineering optimizes the entire conversation environment. Discover why context engineering is becoming the key differentiator in voice AI.

The Signal Briefing

Un email por semana. Cómo los equipos líderes de CS, ingresos e IA están convirtiendo conversaciones en decisiones. Benchmarks, playbooks y lo que funciona en producción.

500+ líderes de CS e ingresos suscritos