Blog/Tags/learning-ai

learning-ai

Browse 45 articles tagged with “learning-ai”.

Articles tagged “learning-ai”

45 articles

Three glowing rubric cards floating in misted air, each marking the same transcript with subtly different ink colors, with a faint kappa heatmap projected on the wall behind them

Testing & Evaluation·11 min read

GPT-5, Claude 4.5, Gemini Score the Same Calls. Their Kappa Is 0.52

Run the same calls through GPT-5, Claude 4.5, and Gemini and Cohen's kappa lands at 0.52. Here is how to measure judge agreement on your own corpus.

Illustration of a person drawing a causal graph on a whiteboard while teammates watch

Learning AI·22 min read

Correlation Killed Your Retention Model. Causal AI Fixes It.

Your churn model says support calls cause retention. They don't. Build a causal pipeline with DoWhy, EconML, and propensity matching in Python.

Person connecting protocol cables between two glowing devices with diagrams on a whiteboard

Learning AI·22 min read

Build the MCP + A2A agent protocol stack from scratch

Wire an MCP server to an A2A agent that delegates tasks and calls tools. TypeScript and Python examples, Streamable HTTP transport, Agent Cards, and auth.

Person sorting through stacks of documents, crossing out wrong ones, with a magnifying glass on the desk

Learning AI·22 min read

Agentic RAG: from dumb retrieval to self-correcting agents

Your RAG pipeline retrieves wrong documents and nobody catches it. Build a self-correcting agent that grades results, rewrites queries, and knows when to stop.

Watercolor illustration of a colorful workspace with a main monitor surrounded by floating screens showing different code, warm plum tones

Learning AI·18 min read read

Claude Code subagents and the orchestrator pattern

How to structure Claude Code subagents, write dispatch prompts, and coordinate parallel work across services, SDKs, and frontends in a monorepo.

Person drawing a web of connected nodes on a glass wall with colorful sticky notes around the edges

Learning AI·22 min read read

Graph memory for AI agents: when vector search isn't enough

Build graph memory for AI agents in TypeScript and Python. Extract entities, track relationships over time, and compare Mem0, Zep, and Letta in production.

Person wearing a headset at a desk with sound waveforms visible on screen, golden amber atmosphere

Learning AI·22 min read

Voice AI pipeline: STT, LLM, TTS and the 300ms budget

Build a real-time voice pipeline with Pipecat. How STT, LLM, and TTS stream concurrently under a 300ms latency budget, with turn detection and interruptions.

Engineering team reviewing real-time AI agent monitoring dashboards with metrics and conversation traces

Learning AI·22 min read read

Build an AI Agent Observability Pipeline from Scratch

Build a production observability pipeline for AI agents using TypeScript and the Chanl SDK. Covers metrics, traces, quality scoring, drift detection, and alerting.

Visualization of an AI agent context window filling up with system prompts, tool definitions, and conversation history

Learning AI·20 min read read

Your AI Agent's Context Window Is Already Half Full

System prompts, tool schemas, MCP descriptions, memory injection, conversation history. They all eat tokens before the user says a word. Learn where your context budget goes and how to manage it.

Illustration of a quality monitoring dashboard showing score trends and alert thresholds across production AI agent conversations

Learning AI·20 min read

Production Agent Evals: Catch Score Drift, Ship Confidently

Your evals pass in staging but miss production failures. Build three eval pipelines with the Chanl SDK: automated scorecards, scenario regression, and drift detection that catches quality degradation before customers do.

Watercolor illustration of a traffic control tower overlooking a busy intersection of code agents, warm amber and teal tones

Learning AI·14 min read read

How to enforce the orchestrator pattern in Claude Code

The main Claude Code thread plans and reviews. Subagents implement. Three enforcement layers make this mandatory: CLAUDE.md, skills, and hooks. Includes a starter kit you can copy.

Illustration of a balance scale tilted by invisible weights, representing hidden biases in AI evaluation systems

Learning AI·18 min read

12 Ways Your LLM Judge Is Lying to You

Research identifies 12 systematic biases in LLM-as-a-judge systems. Learn to detect and mitigate each one before they corrupt your eval pipeline.

Visualization of the widening gap between AI agent capability scores and reliability metrics across model generations

Learning AI·15 min read

Your Agent Is Getting Smarter. It's Not Getting More Reliable.

Reliability improves at half the rate of accuracy. Three 85%+ tools combine to just 74%. Here's the math, the research, and the testing protocols that close the gap.

Person exploring geometric shapes representing vector space

Learning AI·20 min read

Embeddings Turn Text Into Meaning. Here's the Math and the Code

What embeddings are, how similarity search works under the hood, and how to build a semantic search engine, from cosine similarity math to production vector databases.

Person building with tool components at a desk

Learning AI·20 min read

Function Calling: Build a Multi-Tool AI Agent from Scratch

Build a multi-tool AI agent from scratch using function calling across OpenAI, Anthropic, and Google. Runnable TypeScript and Python code, validation with Zod and Pydantic, and production hardening patterns.

Illustration of an AI agent navigating branching knowledge paths across interconnected document nodes

Learning AI·18 min read

Your RAG Pipeline Is Answering the Wrong Question

Naive RAG scores 42% on multi-hop questions. Agentic RAG hits 94.5%. The difference: letting the agent decide what to retrieve, when, and whether the results are good enough. Build both in TypeScript and Python.

Illustration of an engineer assembling context layers for an AI agent, with memory, tools, and knowledge sources flowing into a central pipeline

Learning AI·21 min read

Context Engineering Is What Your Agent Actually Needs

Prompt engineering hits a wall with production AI agents. Context engineering fixes it. Build a full context pipeline with memory, RAG, history compression, and tool resolution.

Developer comparing small and large AI model outputs on a monitor

Learning AI·18 min read

A 7B Domain Model Beat Everything We Tried

Domain-specific language models are beating trillion-parameter generalists on vertical tasks. Here's when a 7B model is the right call, how the training pipeline works, and what production teams are shipping today.

Illustration of a neural network with low-rank adapter matrices injected between layers, showing only a small percentage of parameters highlighted for training

Learning AI·19 min read

Fine-Tune a 7B Model for $1,500 (Not $50,000)

Full fine-tuning costs $50K in H100s. QLoRA on an RTX 4090 costs $1,500. Learn how LoRA and QLoRA let you train only 0.1-1% of parameters with nearly identical results, with working code for fine-tuning models that understand your agent's tool schemas.

Neural network distillation visualization showing a large teacher model transferring knowledge to a compact student model

Learning AI·16 min read

A 1B Model Just Matched the 70B. Here's How.

How to distill frontier LLMs into small, cheap models that retain 98% accuracy on agent tasks. The teacher-student pattern, NVIDIA's data flywheel, and the Plan-and-Execute architecture that cuts agent costs by 90%.

Small chip outperforming a rack of servers

Learning AI·14 min read

Why Your AI Bill Is 30x Too High

Small language models match GPT-3.5 at 2% of the size and 95% less cost. Benchmarks, code, and a migration story from $13K/month to $400.

Watercolor illustration of developers at a cafe terrace with MCP diagram on whiteboard — Teal & Copper style

Learning AI·15 min read

Part 1: Claude's 7 Extension Points — The Mental Model

CLAUDE.md, Skills, Hooks, MCP Servers, Connectors, Claude Apps, Plugins — Claude's extension ecosystem is powerful but confusing. Here's the mental model that makes sense of all 7.

Watercolor illustration of developers at a cafe terrace with LLM layered diagram on whiteboard — Terra Cotta style

Learning AI·17 min read

Part 2: CLAUDE.md, Hooks, and Skills — Three Layers

CLAUDE.md sets conventions. Hooks enforce them. Skills teach workflows. Understanding these three layers — and their reliability spectrum — is the key to a Claude Code setup that actually works.

Watercolor illustration of developers at a cafe terrace with MCP plug-and-socket diagram on whiteboard — Sage & Olive style

Learning AI·17 min read

Part 3: MCP Servers vs. Connectors vs. Apps

All Claude Apps are Connectors. All Connectors are MCP Servers. Understanding this hierarchy — and when to build vs. use managed integrations — saves weeks of unnecessary engineering.

Watercolor illustration of developers at a cafe terrace with rocket deployment diagram on screen — Dusty Blue style

Learning AI·20 min read

Part 4: All 7 Extension Points in One Production Codebase

50+ skills, multiple MCP servers, scoped rules, safety hooks — here's how all 7 Claude extension points compose in a real NestJS monorepo with 17 projects. What works, what fights, and what we'd do differently.

Claude AI agent development tools with code on a developer workspace

Agent Architecture·20 min read read

Claude 4.6 broke our production agent in two hours — here's what's worth the migration

A practical developer guide to Claude 4.6 — adaptive thinking, 1M context, compaction API, tool search, and structured outputs. Real code examples in TypeScript and Python for building production AI agents.

Watercolor illustration of two interlocking systems — tools and behavioral instructions — powering an AI agent

Tools & MCP·14 min read read

Your agent has 30 tools and no idea when to use them

MCP tools give agents external capabilities. Skills give agents behavioral expertise. Learn the architecture of both, build them in TypeScript, and understand when to use each — and when you need both.

Watercolor illustration of an engineer monitoring a production AI agent dashboard with reliability metrics

Agent Architecture·24 min read

Agentic AI in Production: From Prototype to Reliable Service

Ship agentic AI that doesn't break at 2 AM. Covers orchestration patterns (ReAct, planning loops), error handling, circuit breakers, graceful degradation, observability, and scaling — with TypeScript implementations you can steal.

Watercolor illustration of interconnected memory nodes forming a knowledge network in sage and olive tones

Knowledge & Memory·25 min read read

AI Agent Memory: From Session Context to Long-Term Knowledge

Build AI agent memory systems from scratch in TypeScript. Covers memory types (session, episodic, semantic, procedural), architectures (buffer, summary, vector retrieval), RAG intersection, and privacy-first design.

Watercolor illustration of an engineering team monitoring AI agent dashboards with data flowing across screens

Operations·28 min read read

AI Agent Observability: What to Monitor When Your Agent Goes Live

Build a production observability pipeline for AI agents. Covers latency, token usage, tool success rates, conversation quality, drift detection, structured logging, alerting strategies, and the critical difference between LLM and agent observability.

Illustration of a team evaluating AI agent quality through structured testing scenarios

Testing & Evaluation·24 min read

AI Agent Testing: How to Evaluate Agents Before They Talk to Customers

A practical guide to testing AI agents before production — scenario-based testing with AI personas, scorecard evaluation, regression suites, edge case generation, and CI/CD integration.

Watercolor illustration of developers collaborating around a whiteboard with tool integration diagrams

Tools & MCP·26 min read read

AI Agent Tools: MCP, OpenAPI, and Tool Management That Actually Scales

How production AI agents discover, execute, and manage tools — from MCP protocol to OpenAPI auto-importing, security sandboxing, and multi-tenant tool infrastructure.

AI agent memory architecture with semantic search vectors

Learning AI·20 min read read

Build your own AI agent memory system — what breaks when real users show up?

Build a complete memory system for customer-facing AI agents — session context, persistent recall, semantic search. Then learn what breaks when real customers start returning.

Developer building AI agent tools at a whiteboard

Learning AI·20 min read read

Build your own AI agent tool system — what breaks when you add the 20th tool?

Build a complete tool system for customer-facing AI agents from scratch — registry, execution, auth, monitoring. Then learn what breaks when real customers start calling.

Developer working through advanced MCP protocol integration patterns on a screen

Tools & MCP·25 min read

MCP Deep Dive: Advanced Patterns for Agent Tool Integration

Production MCP patterns for teams who've built their first server and need to scale it — OAuth 2.1 with PKCE, Streamable HTTP transport, gateways, sampling, dynamic tool registration, and multi-tenant security.

Watercolor illustration of converging streams representing voice, vision, and text flowing into an AI agent system

Agent Architecture·28 min read read

Multimodal AI Agents: Voice, Vision, and Text in Production

How to architect multimodal AI agents that process voice, vision, and text simultaneously — from STT→LLM→TTS pipelines to vision integration, latency budgets, and production fusion strategies.

Watercolor illustration of voice AI waveforms flowing through a technical architecture diagram with golden amber tones

Agent Architecture·19 min read read

Voice Agent Platform Architecture: The Stack Behind Sub-300ms Responses

Deep dive into voice agent architecture — the STT→LLM→TTS pipeline, latency budgets, interruption handling, WebRTC vs WebSocket transport, and what orchestration platforms leave on the table.

Developer comparing two approaches on a whiteboard

Knowledge & Memory·20 min read

Fine-tuning vs RAG: why most teams pick wrong and how to decide

When to fine-tune, when to use RAG, and when you need both — with hands-on LoRA fine-tuning and RAG implementation on the same task to show the difference.

Team of developers collaborating on multi-agent AI architecture

Learning AI·20 min read

Multi-Agent AI Systems: Build an Agent Orchestrator Without a Framework

Build a multi-agent system from scratch — delegation, planning loops, and inter-agent communication — before reaching for LangGraph or CrewAI.

Engineer debugging a real-time streaming architecture on a monitor

Learning AI·20 min read

Streaming AI Responses: SSE, WebSockets, and the Architecture Behind ChatGPT's Typing Effect

Build three streaming implementations from scratch — SSE, WebSocket, and HTTP/2 — and learn why token-by-token rendering is harder than it looks.

Illustration of two people reviewing an improvement chart together at a standing desk

Learning AI·20 min read

How to Evaluate AI Agents: Build an Eval Framework from Scratch

Build a working AI agent eval framework in TypeScript and Python. Covers LLM-as-judge, rubric scoring, regression testing, and CI integration.

Illustration of a diverse team collaborating around a whiteboard with code diagrams

Learning AI·20 min read

MCP Explained: Build Your First MCP Server in TypeScript and Python

Build a working MCP server from scratch in TypeScript and Python. Hands-on tutorial covering tools, resources, transports, and testing.

Illustration of a person writing thoughtfully at a desk with sticky notes and a warm lamp

Learning AI·25 min read

Prompt Engineering from First Principles: 12 Techniques Every AI Developer Needs

Master 12 essential prompt engineering techniques with real TypeScript examples. From zero-shot to ReAct, build better AI agents from first principles.

Illustration of a person organizing knowledge on a corkboard with connected notes

Learning AI·18 min read

RAG from Scratch: Build a Retrieval-Augmented Generation Pipeline

Build a working RAG pipeline from scratch in TypeScript and Python. Covers embeddings, chunking, vector search, and generation with real, runnable code.

man in blue dress shirt sitting on black office rolling chair - Photo by David Schultz on Unsplash

Agent Architecture·22 min read

How Multimodal Voice AI Works: From Audio-Only to Vision-Aware Agents

How multimodal voice AI combines speech, vision, and text into a single agent — architecture patterns, latency tradeoffs, and TypeScript code you can run.

Learn Agentic AI

Weekly. Patterns for shipping agents that work — MCP, scorecards, regression tests, prompts, model comparisons.

500+ builders subscribed