Blog/Tags/learning-ai

learning-ai

Browse 45 articles tagged with “learning-ai”.

Articles tagged “learning-ai”

45 articles

Three glowing rubric cards floating in misted air, each marking the same transcript with subtly different ink colors, with a faint kappa heatmap projected on the wall behind them

Testing & Evaluation·11 min read

GPT-5, Claude 4.5, Gemini Score the Same Calls. Their Kappa Is 0.52

Run the same calls through GPT-5, Claude 4.5, and Gemini and Cohen's kappa lands at 0.52. Here is how to measure judge agreement on your own corpus.

Illustration of a person drawing a causal graph on a whiteboard while teammates watch

Learning AI·22 min read

Correlation Killed Your Retention Model. Causal AI Fixes It.

Your churn model says support calls cause retention. They don't. Build a causal pipeline with DoWhy, EconML, and propensity matching in Python.

Person connecting protocol cables between two glowing devices with diagrams on a whiteboard

Learning AI·22 min read

Build the MCP + A2A agent protocol stack from scratch

Wire an MCP server to an A2A agent that delegates tasks and calls tools. TypeScript and Python examples, Streamable HTTP transport, Agent Cards, and auth.

Person sorting through stacks of documents, crossing out wrong ones, with a magnifying glass on the desk

Learning AI·22 min read

Agentic RAG: from dumb retrieval to self-correcting agents

Your RAG pipeline retrieves wrong documents and nobody catches it. Build a self-correcting agent that grades results, rewrites queries, and knows when to stop.

Watercolor illustration of a colorful workspace with a main monitor surrounded by floating screens showing different code, warm plum tones

Learning AI·18 min read read

Claude Code subagents and the orchestrator pattern

How to structure Claude Code subagents, write dispatch prompts, and coordinate parallel work across services, SDKs, and frontends in a monorepo.

Person drawing a web of connected nodes on a glass wall with colorful sticky notes around the edges

Learning AI·22 min read read

Graph memory for AI agents: when vector search isn't enough

Build graph memory for AI agents in TypeScript and Python. Extract entities, track relationships over time, and compare Mem0, Zep, and Letta in production.

Person wearing a headset at a desk with sound waveforms visible on screen, golden amber atmosphere

Learning AI·22 min read

Voice AI pipeline: STT, LLM, TTS and the 300ms budget

Build a real-time voice pipeline with Pipecat. How STT, LLM, and TTS stream concurrently under a 300ms latency budget, with turn detection and interruptions.

Engineering team reviewing real-time AI agent monitoring dashboards with metrics and conversation traces

Learning AI·22 min read read

Build an AI Agent Observability Pipeline from Scratch

Build a production observability pipeline for AI agents using TypeScript and the Chanl SDK. Covers metrics, traces, quality scoring, drift detection, and alerting.

Visualization of an AI agent context window filling up with system prompts, tool definitions, and conversation history

Learning AI·20 min read read

Your AI Agent's Context Window Is Already Half Full

System prompts, tool schemas, MCP descriptions, memory injection, conversation history. They all eat tokens before the user says a word. Learn where your context budget goes and how to manage it.

Illustration of a quality monitoring dashboard showing score trends and alert thresholds across production AI agent conversations

Learning AI·20 min read

Production Agent Evals: Catch Score Drift, Ship Confidently

Your evals pass in staging but miss production failures. Build three eval pipelines with the Chanl SDK: automated scorecards, scenario regression, and drift detection that catches quality degradation before customers do.

Watercolor illustration of a traffic control tower overlooking a busy intersection of code agents, warm amber and teal tones

Learning AI·14 min read read

How to enforce the orchestrator pattern in Claude Code

The main Claude Code thread plans and reviews. Subagents implement. Three enforcement layers make this mandatory: CLAUDE.md, skills, and hooks. Includes a starter kit you can copy.

Illustration of a balance scale tilted by invisible weights, representing hidden biases in AI evaluation systems

Learning AI·18 min read

12 Ways Your LLM Judge Is Lying to You

Research identifies 12 systematic biases in LLM-as-a-judge systems. Learn to detect and mitigate each one before they corrupt your eval pipeline.

Visualization of the widening gap between AI agent capability scores and reliability metrics across model generations

Learning AI·15 min read

Your Agent Is Getting Smarter. It's Not Getting More Reliable.

Reliability improves at half the rate of accuracy. Three 85%+ tools combine to just 74%. Here's the math, the research, and the testing protocols that close the gap.

Person exploring geometric shapes representing vector space

Learning AI·20 min read

Embeddings Turn Text Into Meaning. Here's the Math and the Code

What embeddings are, how similarity search works under the hood, and how to build a semantic search engine, from cosine similarity math to production vector databases.

Person building with tool components at a desk

Learning AI·20 min read

Function Calling: Build a Multi-Tool AI Agent from Scratch

Build a multi-tool AI agent from scratch using function calling across OpenAI, Anthropic, and Google. Runnable TypeScript and Python code, validation with Zod and Pydantic, and production hardening patterns.

Illustration of an AI agent navigating branching knowledge paths across interconnected document nodes

Learning AI·18 min read

Your RAG Pipeline Is Answering the Wrong Question

Naive RAG scores 42% on multi-hop questions. Agentic RAG hits 94.5%. The difference: letting the agent decide what to retrieve, when, and whether the results are good enough. Build both in TypeScript and Python.

Illustration of an engineer assembling context layers for an AI agent, with memory, tools, and knowledge sources flowing into a central pipeline

Learning AI·21 min read

Context Engineering Is What Your Agent Actually Needs

Prompt engineering hits a wall with production AI agents. Context engineering fixes it. Build a full context pipeline with memory, RAG, history compression, and tool resolution.

Developer comparing small and large AI model outputs on a monitor

Learning AI·18 min read

A 7B Domain Model Beat Everything We Tried

Domain-specific language models are beating trillion-parameter generalists on vertical tasks. Here's when a 7B model is the right call, how the training pipeline works, and what production teams are shipping today.

Illustration of a neural network with low-rank adapter matrices injected between layers, showing only a small percentage of parameters highlighted for training

Learning AI·19 min read

Fine-Tune a 7B Model for $1,500 (Not $50,000)

Full fine-tuning costs $50K in H100s. QLoRA on an RTX 4090 costs $1,500. Learn how LoRA and QLoRA let you train only 0.1-1% of parameters with nearly identical results, with working code for fine-tuning models that understand your agent's tool schemas.

Neural network distillation visualization showing a large teacher model transferring knowledge to a compact student model

Learning AI·16 min read

A 1B Model Just Matched the 70B. Here's How.

How to distill frontier LLMs into small, cheap models that retain 98% accuracy on agent tasks. The teacher-student pattern, NVIDIA's data flywheel, and the Plan-and-Execute architecture that cuts agent costs by 90%.

Small chip outperforming a rack of servers

Learning AI·14 min read

Why Your AI Bill Is 30x Too High

Small language models match GPT-3.5 at 2% of the size and 95% less cost. Benchmarks, code, and a migration story from $13K/month to $400.

Ilustración en acuarela de desarrolladores en una terraza de café con diagrama MCP en pizarra — estilo Teal & Copper

Learning AI·15 min read

Parte 1: Los 7 Puntos de Extensión de Claude — El Modelo Mental

CLAUDE.md, Skills, Hooks, MCP Servers, Connectors, Claude Apps, Plugins — el ecosistema de extensiones de Claude es poderoso pero confuso. Aquí está el modelo mental que le da sentido a los 7.

Ilustración en acuarela de desarrolladores en una terraza de café con diagrama de capas LLM en pizarra — estilo Terra Cotta

Learning AI·17 min read

Parte 2: CLAUDE.md, Hooks y Skills — Tres Capas

CLAUDE.md establece convenciones. Los Hooks las aplican. Los Skills enseñan flujos de trabajo. Entender estas tres capas — y su espectro de confiabilidad — es la clave para una configuración de Claude Code que realmente funcione.

Ilustración en acuarela de desarrolladores en una terraza de café con diagrama de conexiones MCP en pizarra — estilo Sage & Olive

Learning AI·17 min read

Parte 3: MCP Servers vs. Connectors vs. Apps

Todas las Claude Apps son Connectors. Todos los Connectors son MCP Servers. Entender esta jerarquía — y cuándo construir vs. usar integraciones administradas — ahorra semanas de ingeniería innecesaria.

Ilustración en acuarela de desarrolladores en una terraza de café con diagrama de despliegue de cohete en pantalla — estilo Dusty Blue

Learning AI·20 min read

Parte 4: Los 7 Puntos de Extensión en una Base de Código de Producción

Más de 50 skills, múltiples MCP servers, reglas con alcance, hooks de seguridad — así es como los 7 puntos de extensión de Claude se componen en un monorepo NestJS real con 17 proyectos. Qué funciona, qué entra en conflicto y qué haríamos diferente.

Claude AI agent development tools with code on a developer workspace

Agent Architecture·20 min read read

Claude 4.6 broke our production agent in two hours — here's what's worth the migration

A practical developer guide to Claude 4.6 — adaptive thinking, 1M context, compaction API, tool search, and structured outputs. Real code examples in TypeScript and Python for building production AI agents.

Watercolor illustration of two interlocking systems — tools and behavioral instructions — powering an AI agent

Tools & MCP·14 min read read

Your agent has 30 tools and no idea when to use them

MCP tools give agents external capabilities. Skills give agents behavioral expertise. Learn the architecture of both, build them in TypeScript, and understand when to use each — and when you need both.

Ilustracion en acuarela de un ingeniero monitoreando un dashboard de agentes de IA en produccion con metricas de confiabilidad

Agent Architecture·24 min read

IA Agentica en Produccion: De Prototipo a Servicio Confiable

Lleva IA agentica a produccion sin que se rompa a las 2 AM. Cubre patrones de orquestacion (ReAct, bucles de planificacion), manejo de errores, circuit breakers, degradacion elegante, observabilidad y escalamiento, con implementaciones en TypeScript que puedes reutilizar.

Ilustracion en acuarela de nodos de memoria interconectados formando una red de conocimiento en tonos verde salvia y oliva

Knowledge & Memory·25 min read read

Memoria de Agentes de IA: Del contexto de sesion al conocimiento a largo plazo

Construye sistemas de memoria para agentes de IA desde cero en TypeScript. Cubre tipos de memoria (sesion, episodica, semantica, procedural), arquitecturas (buffer, resumen, recuperacion vectorial), interseccion con RAG y diseno con privacidad.

Watercolor illustration of an engineering team monitoring AI agent dashboards with data flowing across screens

Operations·28 min read read

AI Agent Observability: What to Monitor When Your Agent Goes Live

Build a production observability pipeline for AI agents. Covers latency, token usage, tool success rates, conversation quality, drift detection, structured logging, alerting strategies, and the critical difference between LLM and agent observability.

Illustration of a team evaluating AI agent quality through structured testing scenarios

Testing & Evaluation·24 min read

AI Agent Testing: How to Evaluate Agents Before They Talk to Customers

A practical guide to testing AI agents before production — scenario-based testing with AI personas, scorecard evaluation, regression suites, edge case generation, and CI/CD integration.

Ilustración en acuarela de desarrolladores colaborando alrededor de una pizarra con diagramas de integración de herramientas

Tools & MCP·26 min read read

Herramientas para Agentes de IA: MCP, OpenAPI y Gestión de Herramientas que Realmente Escala

Cómo los agentes de IA en producción descubren, ejecutan y gestionan herramientas: desde el protocolo MCP hasta la importación automática de OpenAPI, sandboxing de seguridad e infraestructura de herramientas multi-tenant.

AI agent memory architecture with semantic search vectors

Learning AI·20 min read read

Build your own AI agent memory system — what breaks when real users show up?

Build a complete memory system for customer-facing AI agents — session context, persistent recall, semantic search. Then learn what breaks when real customers start returning.

Desarrollador construyendo herramientas para agentes de IA en una pizarra

Learning AI·20 min read read

Construye tu propio sistema de herramientas para agentes de IA: ¿qué se rompe cuando agregas la herramienta número 20?

Construye un sistema completo de herramientas para agentes de IA orientados al cliente desde cero: registro, ejecución, autenticación y monitoreo. Luego aprende qué se rompe cuando los clientes reales comienzan a llamar.

Developer working through advanced MCP protocol integration patterns on a screen

Tools & MCP·25 min read

MCP Deep Dive: Advanced Patterns for Agent Tool Integration

Production MCP patterns for teams who've built their first server and need to scale it — OAuth 2.1 with PKCE, Streamable HTTP transport, gateways, sampling, dynamic tool registration, and multi-tenant security.

Watercolor illustration of converging streams representing voice, vision, and text flowing into an AI agent system

Agent Architecture·28 min read read

Multimodal AI Agents: Voice, Vision, and Text in Production

How to architect multimodal AI agents that process voice, vision, and text simultaneously — from STT→LLM→TTS pipelines to vision integration, latency budgets, and production fusion strategies.

Watercolor illustration of voice AI waveforms flowing through a technical architecture diagram with golden amber tones

Agent Architecture·19 min read read

Voice Agent Platform Architecture: The Stack Behind Sub-300ms Responses

Deep dive into voice agent architecture — the STT→LLM→TTS pipeline, latency budgets, interruption handling, WebRTC vs WebSocket transport, and what orchestration platforms leave on the table.

Developer comparing two approaches on a whiteboard

Knowledge & Memory·20 min read

Fine-tuning vs RAG: why most teams pick wrong and how to decide

When to fine-tune, when to use RAG, and when you need both — with hands-on LoRA fine-tuning and RAG implementation on the same task to show the difference.

Team of developers collaborating on multi-agent AI architecture

Learning AI·20 min read

Multi-Agent AI Systems: Build an Agent Orchestrator Without a Framework

Build a multi-agent system from scratch — delegation, planning loops, and inter-agent communication — before reaching for LangGraph or CrewAI.

Engineer debugging a real-time streaming architecture on a monitor

Learning AI·20 min read

Streaming AI Responses: SSE, WebSockets, and the Architecture Behind ChatGPT's Typing Effect

Build three streaming implementations from scratch — SSE, WebSocket, and HTTP/2 — and learn why token-by-token rendering is harder than it looks.

Ilustracion de dos personas revisando un grafico de mejoras juntas en un escritorio de pie

Learning AI·20 min read

Como evaluar agentes de IA: construye un framework de evaluacion desde cero

Construye un framework funcional de evaluacion de agentes de IA en TypeScript y Python. Cubre LLM-as-judge, puntuacion por rubrica, pruebas de regresion e integracion con CI.

Ilustración de un equipo diverso colaborando alrededor de una pizarra con diagramas de código

Learning AI·20 min read

MCP Explicado: Construye Tu Primer Servidor MCP en TypeScript y Python

Construye un servidor MCP funcional desde cero en TypeScript y Python. Tutorial práctico que cubre tools, resources, transports y testing.

Ilustración de una persona escribiendo pensativamente en un escritorio con notas adhesivas y una lámpara cálida

Learning AI·25 min read

Prompt Engineering desde Primeros Principios: 12 Técnicas que Todo Desarrollador de IA Necesita

Domina 12 técnicas esenciales de prompt engineering con ejemplos reales en TypeScript. Desde zero-shot hasta ReAct, construye mejores agentes de IA desde primeros principios.

Ilustración de una persona organizando conocimiento en un tablero de corcho con notas conectadas

Learning AI·18 min read

RAG desde Cero: Construye un Pipeline de Generación Aumentada por Recuperación

Construye un pipeline RAG funcional desde cero en TypeScript y Python. Cubre embeddings, chunking, búsqueda vectorial y generación con código real y ejecutable.

man in blue dress shirt sitting on black office rolling chair - Photo by David Schultz on Unsplash

Agent Architecture·22 min read

How Multimodal Voice AI Works: From Audio-Only to Vision-Aware Agents

How multimodal voice AI combines speech, vision, and text into a single agent — architecture patterns, latency tradeoffs, and TypeScript code you can run.

The Signal Briefing

Un email por semana. Cómo los equipos líderes de CS, ingresos e IA están convirtiendo conversaciones en decisiones. Benchmarks, playbooks y lo que funciona en producción.

500+ líderes de CS e ingresos suscritos