What is progressive tool discovery in MCP?

Progressive tool discovery is a pattern where an agent loads only the tools it needs for the current task rather than all available tools at startup. Instead of registering 100+ tools upfront, the agent uses a meta-tool to search or browse tool categories on demand, then loads specific tools when they're needed. This dramatically reduces context usage and improves tool selection accuracy.

How many tokens does loading 50 MCP tools use?

Loading 50 MCP tools consumes roughly 72,000 tokens just for tool definitions, and around 77,000 tokens total before any task execution begins. This leaves less room for conversation history, retrieved context, and reasoning. Research shows tool selection accuracy drops significantly once models are presented with more than 30 to 50 tools simultaneously.

What is the Tool Search Tool in MCP?

The Tool Search Tool is a capability introduced by Anthropic for working with large tool libraries in production. Instead of loading all tool definitions into context, the agent uses a single search tool to discover relevant tools by keyword or semantic similarity. Anthropic moved this to general availability in February 2026 as part of the Advanced Tool Use framework.

How does the hierarchical meta-tools pattern work?

The hierarchical pattern presents the agent with category-level meta-tools instead of individual tools. For example, instead of 40 individual CRM tools, the agent sees one meta-tool called crm_tools that describes what the CRM category handles. When the agent calls it, the server returns the specific tool definitions for just that category. This compresses discovery overhead to 1,600 to 2,500 tokens regardless of library size.

When should I use semantic search vs category bundles for tool discovery?

Use semantic tool search when your tool library is large (100+ tools) and tool purposes vary widely. Use category bundles when your tools group naturally by domain such as CRM tools, calendar tools, or billing tools, and you can predict which category a conversation will need. Many teams combine both: category bundles for initial routing and semantic search within a category for final selection.

Does progressive tool discovery work with all MCP hosts?

Progressive tool discovery works with any MCP host that supports dynamic tool registration and deregistration. The pattern requires the server to expose a meta-tool at startup and register additional tools at runtime. Most modern MCP runtimes including Claude.ai, Cursor, and programmatic MCP clients support this. Older fixed-tool-list hosts may not support dynamic registration.

What is the Cloudflare example of progressive tool discovery?

Cloudflare reduced their entire API surface from over 2,500 endpoints, which would consume 1.17 million tokens as a native MCP server, to just two tools using approximately 1,000 tokens. The two tools handle discovery and execution, loading specific endpoint definitions on demand. This is the most dramatic real-world demonstration of progressive discovery improving context efficiency.

Stop Loading All Your MCP Tools at Once

You spent three months integrating your CRM, calendar, billing system, support ticketing platform, and internal knowledge base into one MCP server. You flip the switch. Your agent gets... dumber.

It misses obvious tools. It picks the wrong one half the time. It confuses get_customer_order with get_order_status even though they're completely different functions. You added capability and somehow got worse performance.

The reason is your context window. Before your agent processes a single word of the user's question, it's silently reading 72,000 tokens of tool definitions.

Let's fix that.

Why Tool Count Kills Agent Quality

Too many tools overwhelm the model's attention, the same way a menu with 300 items makes ordering harder, not easier. When you load 50 MCP tools simultaneously, you're not just spending tokens on definitions -- each additional tool adds noise that competes with the tools actually relevant to the current task.

Research from Anthropic and third-party benchmarks consistently shows tool selection accuracy starts dropping once models see more than 30 to 50 tools at once. At 100 tools, selection quality can fall by 40% compared to presenting only the relevant subset. The model isn't ignoring the extra tools; it's actively confused by them.

The token math makes this worse. A typical MCP tool definition -- name, description, parameter schema -- runs 1,200 to 1,800 tokens. Fifty tools: roughly 72,000 tokens. Add system prompt, conversation history, and retrieved context and you've burned 85,000 tokens before the agent generates a single character of response. On a model with a 200K context window, that's nearly half your budget consumed before the conversation starts.

And here's the compounding problem: the more tools you expose, the more the model second-guesses itself. "Should I use get_customer_profile or lookup_account_details? Are these different?" With 8 tools, the model knows. With 80 tools, it hedges.

Tool selection accuracy vs number of simultaneously loaded tools

The solution is progressive tool discovery: load what you need, when you need it, based on what the conversation has revealed so far.

What Progressive Tool Discovery Actually Is

Progressive tool discovery is a design pattern where your agent starts each conversation with a minimal tool footprint -- usually one or two meta-tools -- and dynamically loads specific tools as the conversation reveals what's needed.

Think of it like a skilled mechanic's toolbox. They don't dump every wrench on the floor before they start. They assess the job, open the relevant drawer, and pull out what they need. The full toolkit is available, but not in the way.

There are three main implementations of this pattern: hierarchical meta-tools, semantic tool search, and intent-based bundles. Each trades off differently on complexity, accuracy, and latency. You'll likely end up combining at least two of them.

Pattern 1: Hierarchical Meta-Tools

The hierarchical pattern is the simplest to implement and the right starting point for most teams. Your MCP server exposes a small set of category-level tools at startup. Each category tool, when called, returns the detailed definitions of the tools within that category and dynamically registers them on the current session.

For a CX platform, your startup tools might look like this:

mcp-server/discovery-tools.ts·typescript

const DISCOVERY_TOOLS = [
  {
    name: "customer_tools",
    description:
      "Tools for looking up customers, accounts, orders, and history. " +
      "Call this when you need to find or retrieve any customer-related data.",
    parameters: { type: "object", properties: {} }
  },
  {
    name: "action_tools",
    description:
      "Tools for taking actions: sending emails, processing refunds, " +
      "escalating tickets, updating records. Call this when you need to DO something.",
    parameters: { type: "object", properties: {} }
  },
  {
    name: "knowledge_tools",
    description:
      "Tools for searching knowledge base articles, FAQs, policies, and documentation. " +
      "Call this when you need to look something up, not retrieve customer data.",
    parameters: { type: "object", properties: {} }
  }
];

When the agent calls customer_tools, your server responds by registering the full definitions of get_customer_profile, get_order_history, lookup_account_status, and the rest of the CRM suite for that session. The agent now has access to exactly the tools it needs, loaded on demand.

This is the pattern Amazon Prime Video implemented internally: a single find_tools meta-tool that accepted a task description and returned the relevant tool subset. The context savings were dramatic -- tool definition tokens dropped by roughly 100x compared to loading everything at startup.

The tradeoff: the agent has to make an explicit discovery call before using domain-specific tools. This adds one round-trip of latency and requires your agent to recognize when it needs to discover tools. Good prompting handles this -- tell your agent to always call a discovery tool before attempting any task it hasn't explicitly been given tools for.

Pattern 2: Semantic Tool Search

Semantic search is more powerful but more complex to implement. Instead of category buckets, you build a vector index of all your tool descriptions and expose a single search_tools capability.

The agent describes what it needs in natural language, your server runs a similarity search, and returns the top-k matching tool definitions. Tools are then dynamically registered for the current session.

Anthropic's Tool Search Tool, which moved to general availability in February 2026, implements this pattern. It compresses discovery overhead to 1,600 to 2,500 tokens regardless of whether your library has 40 or 400 tools -- because the search tool itself is small and full definitions only load when selected.

mcp-server/semantic-discovery.ts·typescript

import { VectorStore } from "@your-vector-db/client";
import { Server } from "@modelcontextprotocol/sdk/server/index.js";
 
const server = new Server({ name: "cx-tools", version: "1.0.0" });
const toolIndex = new VectorStore();
 
// Index all tool descriptions at startup
await toolIndex.upsert(
  ALL_TOOLS.map((tool) => ({
    id: tool.name,
    text: `${tool.name}: ${tool.description}. Parameters: ${JSON.stringify(tool.inputSchema)}`,
    metadata: tool
  }))
);
 
// The only tool exposed at startup
const SEARCH_TOOL = {
  name: "search_tools",
  description:
    "Find and load tools relevant to your current task. " +
    "Describe what you need to do and this will return the right tools. " +
    "Always call this before attempting a task you don't have tools for yet.",
  inputSchema: {
    type: "object",
    properties: {
      query: { type: "string", description: "What you're trying to do" },
      limit: { type: "number", description: "Max tools to return (default 5)" }
    },
    required: ["query"]
  }
};
 
server.setRequestHandler(CallToolRequestSchema, async (req) => {
  if (req.params.name === "search_tools") {
    const { query, limit = 5 } = req.params.arguments as {
      query: string;
      limit?: number;
    };
 
    const results = await toolIndex.search(query, { topK: limit });
    const discoveredTools = results.map((r) => r.metadata as MCPTool);
 
    // Register discovered tools for this session
    await registerSessionTools(req.meta?.sessionId, discoveredTools);
 
    return {
      content: [
        {
          type: "text",
          text:
            `Found and loaded ${discoveredTools.length} tools:\n` +
            discoveredTools.map((t) => `- ${t.name}: ${t.description}`).join("\n")
        }
      ]
    };
  }
 
  // Route to dynamically registered session tools
  return handleSessionTool(req);
});

The most striking real-world case here is Cloudflare. Their entire API spans more than 2,500 endpoints. As a native MCP server, those definitions would consume over 1.17 million tokens -- more than most context windows can hold. Using progressive discovery with just two tools, the entire surface area fits in roughly 1,000 tokens. Discovery happens at query time and only the relevant endpoint definitions load.

If you're building toward a large tool library, semantic search is the pattern you'll want eventually. Start with hierarchical meta-tools for simplicity, then migrate to semantic search when your library grows past 50 distinct tools.

Pattern 3: Intent-Based Bundles

The third pattern sits between hierarchical and semantic. You define bundles based on conversation intent and register the right bundle when the agent detects what kind of task it's handling.

For a CX agent, intents map cleanly to tool bundles:

mcp-server/intent-bundles.ts·typescript

export const TOOL_BUNDLES: Record<string, string[]> = {
  billing_dispute: [
    "get_customer_invoices",
    "get_payment_history",
    "process_refund",
    "apply_credit",
    "create_billing_case"
  ],
  order_support: [
    "get_order_status",
    "track_shipment",
    "update_shipping_address",
    "initiate_return",
    "contact_fulfillment"
  ],
  account_management: [
    "get_account_profile",
    "update_contact_info",
    "manage_subscriptions",
    "reset_credentials",
    "toggle_notifications"
  ],
  product_support: [
    "search_knowledge_base",
    "get_product_documentation",
    "check_compatibility",
    "escalate_to_engineering"
  ]
};

When a customer opens with "I was charged twice for my subscription," the agent classifies the intent as billing_dispute and loads exactly those five tools. Nothing else. The classification itself is lightweight -- a single structured output call at the start of the conversation, or even a regex match on keyword patterns if your intents are well-defined.

Intent-based bundles work best when your CX agent handles well-defined task types with clear tool boundaries. If conversations frequently cross intent lines -- a customer who starts with billing and ends up asking about their shipment -- you'll want to combine this with semantic search as a fallback for tools that fall outside the initial bundle.

mcp-config.json

Live

{

"mcpServers":

{

"chanl":

{

"url": "https://acme.chanl.dev/mcp",

"transport": "sse",

"apiKey": "sk-chanl-...a4f2"

}

Tools

12 connected

Memory

Active

Knowledge

3 sources

Writing Tool Descriptions That Discovery Can Actually Use

Progressive discovery only works as well as your tool descriptions allow. If the description for get_customer_profile says "Gets a customer profile," your semantic search has almost nothing to differentiate it from lookup_account_status or fetch_user_details.

Good descriptions for discoverable tools answer three questions: what does this tool do, when should the agent use it instead of similar tools, and what does it return.

Compare these:

tool-descriptions.ts·typescript

// Weak: discovery will misroute this constantly
{
  name: "get_customer_profile",
  description: "Gets customer profile information"
}
 
// Strong: discovery correctly routes billing questions, not order questions
{
  name: "get_customer_profile",
  description:
    "Returns a customer's account details: name, email, billing address, " +
    "subscription tier, account age, and payment method on file. " +
    "Use this when you need identity or billing information about a customer. " +
    "For order history or shipment status, use get_order_history instead."
}

The second description is longer -- roughly 400 tokens instead of 40 -- but it loads during discovery, not upfront. You pay those tokens only when the agent is actually about to use the CRM tools. For the sessions that never touch CRM, those tokens stay out of context entirely.

The reference to competing tools (use get_order_history instead) is especially powerful for semantic search. It creates negative signal that pushes the search algorithm away from this tool when the query is about orders, not accounts.

Handling Latency in Real-Time Conversations

Discovery adds round-trips. Every meta-tool call is an extra inference step before your agent can do useful work. In a real-time voice conversation where you need sub-300ms response starts, this is a genuine constraint worth planning for.

Three approaches handle this well in production:

Pre-warm sessions. When a customer lands in your IVR or chat widget, you often have a few seconds before they finish speaking their first sentence. Use that window to run a lightweight intent classification and pre-register the most likely tool bundle. The agent starts the conversation with tools already loaded.

Predictive co-registration. Track which tools get loaded together in production and build a correlation table. If get_order_status appears in 80% of sessions that also load initiate_return, register both together even when only the first was requested. Chanl's Analytics feature surfaces these tool co-occurrence patterns from your production traffic automatically.

Cache tool definitions across sessions. Tool definitions don't change often. Cache them at the session level and in a shared store across sessions. The first lookup fetches from your MCP server; subsequent lookups hit the cache. In most deployments, this eliminates the majority of discovery overhead after the first session of the day.

Putting It Together with the Chanl SDK

Chanl's MCP integration supports dynamic tool registration per session through the chanl.tools client, so you can layer progressive discovery on top of your existing tool setup without rebuilding your MCP server:

chanl-dynamic-tools.ts·typescript

import { Chanl } from "@chanl/sdk";
 
const chanl = new Chanl({ apiKey: process.env.CHANL_API_KEY });
 
// List available tools by category without loading full definitions
const toolCategories = await chanl.tools.list({
  includeDefinitions: false // returns names + short descriptions only
});
 
// Classify the incoming conversation
const intent = await chanl.mcp.classifyIntent({
  agentId: "cx-agent-v2",
  conversationOpening: userMessage,
  intents: Object.keys(TOOL_BUNDLES)
});
 
// Load only the tools for the detected intent
const sessionTools = await chanl.tools.load(
  TOOL_BUNDLES[intent.primary],
  { sessionId: currentSessionId }
);
 
// Run with the minimal tool footprint
const result = await chanl.scenarios.run({
  agentId: "cx-agent-v2",
  tools: sessionTools,
  conversation: currentConversation
});

The Tools feature also gives you visibility into which tools are being called in production -- essential when you're tuning your discovery prompts. If you see the agent consistently making unnecessary discovery calls, or skipping discovery and hallucinating tool names, those patterns show up clearly in the tool call trace view. You can build test scenarios that specifically verify the agent calls the discovery tool before attempting tasks outside its initial bundle.

What the Metrics Tell You

Five numbers tell you whether progressive discovery is actually working: tool selection accuracy, context tokens consumed per session, discovery call rate, zero-tool errors, and session p99 latency. Track these before and after rollout -- the pattern of change matters more than any single number in isolation.

Tool selection accuracy. Are the right tools being called for each task? You'll need a labeled test set. The post on building an agent eval framework has a good walkthrough for creating one.

Context tokens per session. How much of your context window are tool definitions consuming at different points in a conversation? Should drop by 60 to 80% from your pre-discovery baseline.

Discovery call rate. How often is the agent calling the meta-tool versus using already-loaded tools? Consistent "re-discovery" of the same tools within a session means the agent isn't retaining what it found. This is often a prompting problem.

Zero-tool errors. Runtime errors from calling tools that aren't registered yet. These mean the agent skipped discovery when it should have run it first. A spike here means your discovery prompting broke.

Session p99 latency. The tail latency impact from discovery round-trips. Pre-warming and caching should keep this under 50ms additional overhead per session in most setups.

Chanl's Monitoring feature captures tool call traces at the session level, including timing data for each tool call and registration event. You can filter by tool name, success rate, and latency -- which makes it straightforward to spot when the wrong tools are being loaded or when discovery is adding unnecessary overhead.

What Not to Do

A few patterns that seem reasonable but cause problems in practice:

Don't organize by name prefix alone. Naming tools crm_get_customer, crm_update_customer, etc. and filtering by prefix during discovery sounds elegant but breaks tool-use reasoning. The model needs to see the full tool name to call it correctly, and prefixes add noise without adding clarity. Use categories in the discovery metadata, not in the tool names.

Don't lazy-load in the middle of multi-step sequences. If your agent is partway through a 5-step workflow and suddenly needs a tool it didn't load at the start, the discovery call disrupts the sequence and often derails the plan entirely. Front-load discovery at the start of any workflow you can predict. The post on handling tool sprawl at scale has good patterns for mapping workflow stages to tool bundles.

Don't skip testing the discovery tool itself. Your meta-tool or search endpoint is now on the critical path for every conversation. Test it under load separately from your main agent tests. Add scenarios to your test suite that specifically exercise discovery with ambiguous queries -- "I need to help a customer with their account" should still find the right tools even when the intent is vague.

Don't expect discovery to fix bad tool design. Progressive loading reduces the visibility of bad tools, but it doesn't fix them. If update_order and modify_order do different things that aren't clear from their names, discovery will route to the wrong one half the time. Fix the underlying tool definitions first.

The Shift in How You Think About Tools

Progressive tool discovery changes the core design question you ask when building an MCP server. Instead of "what tools does my agent need?", you ask "what does my agent need to know in order to find the right tools?"

That shift is actually healthy. It forces better tool categorization, clearer descriptions, and more intentional thinking about what each tool does and when it should be used. Teams that implement progressive discovery often find it's the forcing function that finally gets them to clean up the tool definitions they've been meaning to fix for months.

Your MCP server can expose 500 tools. Your agent just doesn't need to see all 500 at once.

See Which Tools Your Agent Is Actually Using

Chanl shows you exactly which tools your agent calls in production, when discovery is triggered, and where tool selection is going wrong. Connect your MCP server and get tool-level traces in minutes.

Connect Your MCP Server

Key Takeaway

Testing edge cases before production deployment can reduce customer complaints by 80% and prevent costly emergency fixes post-launch.

mcp tool-discovery context-window agent-tools progressive-loading

Dean Grover

Co-founder

Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.

The Signal Briefing

Un email por semana. Cómo los equipos líderes de CS, ingresos e IA están convirtiendo conversaciones en decisiones. Benchmarks, playbooks y lo que funciona en producción.