Articles tagged “learning-ai”
43 articles

Build the MCP + A2A agent protocol stack from scratch
Wire an MCP server to an A2A agent that delegates tasks and calls tools. TypeScript and Python examples, Streamable HTTP transport, Agent Cards, and auth.

Agentic RAG: from dumb retrieval to self-correcting agents
Your RAG pipeline retrieves wrong documents and nobody catches it. Build a self-correcting agent that grades results, rewrites queries, and knows when to stop.

Claude Code subagents and the orchestrator pattern
How to structure Claude Code subagents, write dispatch prompts, and coordinate parallel work across services, SDKs, and frontends in a monorepo.

Graph memory for AI agents: when vector search isn't enough
Build graph memory for AI agents in TypeScript and Python. Extract entities, track relationships over time, and compare Mem0, Zep, and Letta in production.

Voice AI pipeline: STT, LLM, TTS and the 300ms budget
Build a real-time voice pipeline with Pipecat. How STT, LLM, and TTS stream concurrently under a 300ms latency budget, with turn detection and interruptions.

Build an AI Agent Observability Pipeline from Scratch
Build a production observability pipeline for AI agents using TypeScript and the Chanl SDK. Covers metrics, traces, quality scoring, drift detection, and alerting.

Your AI Agent's Context Window Is Already Half Full
System prompts, tool schemas, MCP descriptions, memory injection, conversation history. They all eat tokens before the user says a word. Learn where your context budget goes and how to manage it.

Production Agent Evals: Catch Score Drift, Ship Confidently
Your evals pass in staging but miss production failures. Build three eval pipelines with the Chanl SDK: automated scorecards, scenario regression, and drift detection that catches quality degradation before customers do.

How to enforce the orchestrator pattern in Claude Code
The main Claude Code thread plans and reviews. Subagents implement. Three enforcement layers make this mandatory: CLAUDE.md, skills, and hooks. Includes a starter kit you can copy.

12 Ways Your LLM Judge Is Lying to You
Research identifies 12 systematic biases in LLM-as-a-judge systems. Learn to detect and mitigate each one before they corrupt your eval pipeline.

Your Agent Is Getting Smarter. It's Not Getting More Reliable.
Reliability improves at half the rate of accuracy. Three 85%+ tools combine to just 74%. Here's the math, the research, and the testing protocols that close the gap.

Embeddings Turn Text Into Meaning. Here's the Math and the Code
What embeddings are, how similarity search works under the hood, and how to build a semantic search engine, from cosine similarity math to production vector databases.

Function Calling: Build a Multi-Tool AI Agent from Scratch
Build a multi-tool AI agent from scratch using function calling across OpenAI, Anthropic, and Google. Runnable TypeScript and Python code, validation with Zod and Pydantic, and production hardening patterns.

Your RAG Pipeline Is Answering the Wrong Question
Naive RAG scores 42% on multi-hop questions. Agentic RAG hits 94.5%. The difference: letting the agent decide what to retrieve, when, and whether the results are good enough. Build both in TypeScript and Python.

Context Engineering Is What Your Agent Actually Needs
Prompt engineering hits a wall with production AI agents. Context engineering fixes it. Build a full context pipeline with memory, RAG, history compression, and tool resolution.

A 7B Domain Model Beat Everything We Tried
Domain-specific language models are beating trillion-parameter generalists on vertical tasks. Here's when a 7B model is the right call, how the training pipeline works, and what production teams are shipping today.

Fine-Tune a 7B Model for $1,500 (Not $50,000)
Full fine-tuning costs $50K in H100s. QLoRA on an RTX 4090 costs $1,500. Learn how LoRA and QLoRA let you train only 0.1-1% of parameters with nearly identical results, with working code for fine-tuning models that understand your agent's tool schemas.

A 1B Model Just Matched the 70B. Here's How.
How to distill frontier LLMs into small, cheap models that retain 98% accuracy on agent tasks. The teacher-student pattern, NVIDIA's data flywheel, and the Plan-and-Execute architecture that cuts agent costs by 90%.

Why Your AI Bill Is 30x Too High
Small language models match GPT-3.5 at 2% of the size and 95% less cost. Benchmarks, code, and a migration story from $13K/month to $400.

Part 1: Claude's 7 Extension Points — The Mental Model
CLAUDE.md, Skills, Hooks, MCP Servers, Connectors, Claude Apps, Plugins — Claude's extension ecosystem is powerful but confusing. Here's the mental model that makes sense of all 7.

Part 2: CLAUDE.md, Hooks, and Skills — Three Layers
CLAUDE.md sets conventions. Hooks enforce them. Skills teach workflows. Understanding these three layers — and their reliability spectrum — is the key to a Claude Code setup that actually works.

Part 3: MCP Servers vs. Connectors vs. Apps
All Claude Apps are Connectors. All Connectors are MCP Servers. Understanding this hierarchy — and when to build vs. use managed integrations — saves weeks of unnecessary engineering.

Part 4: All 7 Extension Points in One Production Codebase
50+ skills, multiple MCP servers, scoped rules, safety hooks — here's how all 7 Claude extension points compose in a real NestJS monorepo with 17 projects. What works, what fights, and what we'd do differently.

Claude 4.6 broke our production agent in two hours — here's what's worth the migration
A practical developer guide to Claude 4.6 — adaptive thinking, 1M context, compaction API, tool search, and structured outputs. Real code examples in TypeScript and Python for building production AI agents.

Your agent has 30 tools and no idea when to use them
MCP tools give agents external capabilities. Skills give agents behavioral expertise. Learn the architecture of both, build them in TypeScript, and understand when to use each — and when you need both.

Agentic AI in Production: From Prototype to Reliable Service
Ship agentic AI that doesn't break at 2 AM. Covers orchestration patterns (ReAct, planning loops), error handling, circuit breakers, graceful degradation, observability, and scaling — with TypeScript implementations you can steal.

AI Agent Memory: From Session Context to Long-Term Knowledge
Build AI agent memory systems from scratch in TypeScript. Covers memory types (session, episodic, semantic, procedural), architectures (buffer, summary, vector retrieval), RAG intersection, and privacy-first design.

AI Agent Observability: What to Monitor When Your Agent Goes Live
Build a production observability pipeline for AI agents. Covers latency, token usage, tool success rates, conversation quality, drift detection, structured logging, alerting strategies, and the critical difference between LLM and agent observability.

AI Agent Testing: How to Evaluate Agents Before They Talk to Customers
A practical guide to testing AI agents before production — scenario-based testing with AI personas, scorecard evaluation, regression suites, edge case generation, and CI/CD integration.

AI Agent Tools: MCP, OpenAPI, and Tool Management That Actually Scales
How production AI agents discover, execute, and manage tools — from MCP protocol to OpenAPI auto-importing, security sandboxing, and multi-tenant tool infrastructure.

Build your own AI agent memory system — what breaks when real users show up?
Build a complete memory system for customer-facing AI agents — session context, persistent recall, semantic search. Then learn what breaks when real customers start returning.

Build your own AI agent tool system — what breaks when you add the 20th tool?
Build a complete tool system for customer-facing AI agents from scratch — registry, execution, auth, monitoring. Then learn what breaks when real customers start calling.

MCP Deep Dive: Advanced Patterns for Agent Tool Integration
Production MCP patterns for teams who've built their first server and need to scale it — OAuth 2.1 with PKCE, Streamable HTTP transport, gateways, sampling, dynamic tool registration, and multi-tenant security.

Multimodal AI Agents: Voice, Vision, and Text in Production
How to architect multimodal AI agents that process voice, vision, and text simultaneously — from STT→LLM→TTS pipelines to vision integration, latency budgets, and production fusion strategies.

Voice Agent Platform Architecture: The Stack Behind Sub-300ms Responses
Deep dive into voice agent architecture — the STT→LLM→TTS pipeline, latency budgets, interruption handling, WebRTC vs WebSocket transport, and what orchestration platforms leave on the table.

Fine-tuning vs RAG: why most teams pick wrong and how to decide
When to fine-tune, when to use RAG, and when you need both — with hands-on LoRA fine-tuning and RAG implementation on the same task to show the difference.

Multi-Agent AI Systems: Build an Agent Orchestrator Without a Framework
Build a multi-agent system from scratch — delegation, planning loops, and inter-agent communication — before reaching for LangGraph or CrewAI.

Streaming AI Responses: SSE, WebSockets, and the Architecture Behind ChatGPT's Typing Effect
Build three streaming implementations from scratch — SSE, WebSocket, and HTTP/2 — and learn why token-by-token rendering is harder than it looks.

How to Evaluate AI Agents: Build an Eval Framework from Scratch
Build a working AI agent eval framework in TypeScript and Python. Covers LLM-as-judge, rubric scoring, regression testing, and CI integration.

MCP Explained: Build Your First MCP Server in TypeScript and Python
Build a working MCP server from scratch in TypeScript and Python. Hands-on tutorial covering tools, resources, transports, and testing.

Prompt Engineering from First Principles: 12 Techniques Every AI Developer Needs
Master 12 essential prompt engineering techniques with real TypeScript examples. From zero-shot to ReAct, build better AI agents from first principles.

RAG from Scratch: Build a Retrieval-Augmented Generation Pipeline
Build a working RAG pipeline from scratch in TypeScript and Python. Covers embeddings, chunking, vector search, and generation with real, runnable code.

How Multimodal Voice AI Works: From Audio-Only to Vision-Aware Agents
How multimodal voice AI combines speech, vision, and text into a single agent — architecture patterns, latency tradeoffs, and TypeScript code you can run.
Learn Agentic AI
One lesson a week — practical techniques for building, testing, and shipping AI agents. From prompt engineering to production monitoring. Learn by doing.