ChanlChanl
Agent Architecture

Your MCP server is a monolith. Here's how to fix it

MCP servers dump every tool into the context window, burning tokens before your agent reasons. Four patterns to fix it: decompose, filter, gateway, facade.

DGDean GroverCo-founderFollow
April 3, 2026
14 min read read
A massive warehouse of filing cabinets stretching into fog, with one person sitting at a clean desk with three folders under warm lamplight

Your MCP server has 40 tools. Your agent needs 6 of them. But every time a conversation starts, all 40 tool definitions get injected into the context window. That's roughly 10,000 tokens gone before the agent reads a single user message.

This is the MCP monolith problem, and it's getting worse. Teams connecting to three or four MCP servers (GitHub, Slack, a CRM, an internal API) routinely see 80-120 tools loaded per session. With 120 tools at 250 tokens each, that's 30,000 tokens consumed by tool schemas before the conversation starts. On a 128K context window that's manageable. On a 32K window, it's most of your budget.

The standard advice is "just add more tools." The correct advice is to scope them.

This article walks through four concrete patterns for fixing MCP tool overload: server decomposition, client-side filtering, MCP gateways, and facade servers. Each one reduces the number of tools your agent sees without removing capabilities.

How many tokens do MCP tool definitions actually cost?

A single tool definition, including its name, description, and JSON Schema parameters, typically runs 150-300 tokens. That's the cost of telling the model one tool exists. The model hasn't called it yet. It hasn't even decided whether it's relevant. Multiply across three or four MCP servers and you're spending 24,000-36,000 tokens on tool awareness before your agent processes a single user message.

Here's what that looks like at scale:

ComponentToken Budget% of 128K Context
System prompt2,000-4,0002-3%
Tool definitions (40 tools)8,000-12,0007-9%
Tool definitions (120 tools)24,000-36,00019-28%
Conversation history20,000-60,00016-47%
Remaining for reasoning20,000-74,00016-58%

With 40 tools, the budget is tight but manageable. With 120 tools across three MCP servers, you've lost up to 28% of your context before the conversation starts. On smaller models with 32K or 64K windows, three servers can consume the majority of available tokens.

But the token cost is only half the problem. Practitioners consistently report that tool selection accuracy degrades once tool count exceeds 20-30 in a single context. The model has to pick the right tool from a longer list where descriptions start to sound similar. Anthropic's own tool use docs recommend keeping tool sets focused, and benchmarks like the Berkeley Function-Calling Leaderboard test accuracy across varying tool counts. In practice, error rates for ambiguous tool selection climb noticeably when moving from a 10-tool set to a 50-tool set. The agent picks the wrong tool with high confidence, and nobody notices until a customer gets the wrong answer. Without monitoring that tracks which tools are actually being called, these silent failures accumulate.

This is the monolith failure mode. Not that tools don't work. That the wrong tools get called because the model can't distinguish between 40 options.

How do you split a monolithic MCP server?

Split one large MCP server into multiple domain-specific servers, each exposing 5-12 tools. This is the simplest fix and often the most effective. Instead of connecting your agent to one server that does everything, you connect it only to the servers it needs.

Take a typical GitHub MCP server. Out of the box, it exposes tools for issues, pull requests, repositories, actions, gists, releases, and org management. That's 30+ tools. A support agent that only needs to create and update issues is paying the context window cost for all 30.

The decomposition looks like this:

yaml
# Before: one monolith
servers:
  github:
    url: "https://mcp.github.example.com"
    tools: 35  # issues, PRs, repos, actions, gists, releases, orgs
 
# After: domain-specific micro-servers
servers:
  github-issues:
    url: "https://mcp.github.example.com/issues"
    tools: 8   # create, update, close, search, comment, assign, label, list
  github-prs:
    url: "https://mcp.github.example.com/pulls"
    tools: 10  # create, review, merge, request-changes, diff, checks, ...
  github-repos:
    url: "https://mcp.github.example.com/repos"
    tools: 6   # clone, branch, compare, settings, webhooks, search

The support agent connects to github-issues only. It sees 8 tools instead of 35. The code review agent connects to github-prs and github-repos, seeing 16 tools total. Neither agent pays for tools it never uses.

This pattern aligns with how the MCP spec is designed. Tools are model-controlled and individually defined, so splitting servers by domain gives each agent a cleaner decision space.

The trade-off is operational. You now have three servers to deploy and monitor instead of one. For teams already running MCP infrastructure, this overhead is minimal. For teams just getting started, the next three patterns offer lower-friction alternatives.

Can you scope tools without changing the MCP server?

Yes. Client-side filtering lets you query available tools and load only the subset that matches the current task. The server stays untouched. You control scope entirely from the client, using a task-type manifest that maps conversation types to required tools. The server still exposes everything. The client just ignores what it doesn't need.

Claude Code already does this with its ToolSearch pattern. Rather than injecting all available tools at startup, it maintains a registry and loads tool definitions on demand when the conversation context suggests they're needed. The tools exist. They're just not in the context window until called for.

You can implement the same pattern with any MCP client. The key is a tool manifest that maps task types to required tools:

typescript
// Tool manifest: which tools each task type needs
const TOOL_MANIFESTS = {
  'customer-support': [
    'lookup_account',
    'check_order_status', 
    'create_ticket',
    'escalate_to_human',
    'send_followup_email',
  ],
  'billing-inquiry': [
    'lookup_account',
    'get_payment_history',
    'check_subscription',
    'process_refund',
    'update_payment_method',
  ],
  'technical-debug': [
    'lookup_account',
    'get_system_logs',
    'check_service_status',
    'create_ticket',
    'run_diagnostic',
  ],
};
 
// At conversation start, classify the task and load only relevant tools
async function loadScopedTools(mcpClient, taskType) {
  const allTools = await mcpClient.listTools();
  const allowedNames = TOOL_MANIFESTS[taskType] || TOOL_MANIFESTS['customer-support'];
  
  return allTools.filter(tool => allowedNames.includes(tool.name));
}

The filtering happens before tools enter the context window. The MCP server is untouched. You control scope entirely from the client configuration.

The downside is maintenance. Every time a tool is added to the server, you need to update the manifest to include it for the right task types. Without that update, the new tool is invisible. For rapidly evolving tool sets, a gateway (Pattern 3) handles this more cleanly.

What does an MCP gateway do?

An MCP gateway sits between your agents and your MCP servers. It intercepts tool list requests, applies per-agent filtering rules, and forwards only the allowed tools. The agent talks to the gateway as if it were a normal MCP server. The gateway talks to the real servers behind the scenes.

This is the most powerful pattern because it centralizes tool scoping without touching either the agent code or the MCP servers. You configure policies once at the gateway layer.

yaml
# Gateway configuration: per-agent tool policies
gateway:
  agents:
    support-agent:
      allowed_servers:
        - github-issues
        - internal-crm
      tool_allowlist:
        github-issues: ["create_issue", "update_issue", "search_issues"]
        internal-crm: ["lookup_customer", "get_order", "create_ticket"]
      max_tools: 15
      
    analytics-agent:
      allowed_servers:
        - internal-crm
        - data-warehouse
      tool_allowlist:
        internal-crm: ["search_customers", "get_metrics"]
        data-warehouse: ["run_query", "get_report", "list_dashboards"]
      max_tools: 12
 
  defaults:
    max_tools: 20
    require_allowlist: true  # Block agents without explicit tool policies

The gateway enforces three things. First, which MCP servers an agent can reach. Second, which specific tools on those servers are visible. Third, a hard cap on total tool count. If a new tool appears on the CRM server, it doesn't automatically become available to every agent. Someone has to explicitly add it to the allowlist.

Several MCP gateway implementations have emerged in 2026. The core features to look for are per-agent routing, tool-level access control, request logging, and the ability to transform or enrich tool parameters before forwarding to the underlying server.

For teams building their own, the gateway is essentially a proxy MCP server that implements listTools by aggregating and filtering from upstream servers, and implements callTool by routing to the correct upstream with the original parameters.

Pattern 4: facade servers

A facade server wraps multi-step workflows into single high-level tools. Instead of exposing the individual operations and trusting the agent to chain them correctly, you expose one tool that handles the entire sequence internally.

Consider a deployment workflow. The raw tools might look like this:

typescript
// Without facade: 5 tools the agent must chain correctly
tools: [
  "git_commit",        // Commit changes
  "run_tests",         // Execute test suite
  "docker_build",      // Build container image
  "k8s_apply",         // Deploy to cluster
  "health_check",      // Verify deployment
]

The agent needs to call these in order, handle failures at each step, and decide whether to proceed or roll back. That's a lot of decision-making overhead on every deployment. A facade collapses this into one tool:

typescript
// With facade: 1 tool, internal orchestration
const deployToStaging = {
  name: "deploy_to_staging",
  description: "Deploy current changes to staging environment. " +
    "Runs tests, builds container, deploys, and verifies health. " +
    "Returns deployment status with URL or error details.",
  parameters: {
    type: "object",
    properties: {
      branch: { type: "string", description: "Git branch to deploy" },
      skip_tests: { type: "boolean", default: false },
    },
    required: ["branch"],
  },
};
 
// Internal implementation handles the full chain
async function handleDeployToStaging({ branch, skip_tests }) {
  const commit = await gitCommit(branch);
  
  if (!skip_tests) {
    const testResult = await runTests(branch);
    if (!testResult.passed) {
      return { status: "failed", step: "tests", details: testResult.failures };
    }
  }
  
  const image = await dockerBuild(commit.sha);
  const deployment = await k8sApply(image.tag, "staging");
  const health = await healthCheck(deployment.url);
  
  return {
    status: health.healthy ? "success" : "degraded",
    url: deployment.url,
    commit: commit.sha,
    image: image.tag,
  };
}

Five tools become one. The agent makes a single decision ("deploy this branch") instead of five sequential decisions. Error handling is deterministic code, not LLM reasoning.

The facade pattern works best for well-defined workflows that rarely change. If the deployment process shifts frequently, maintaining the facade adds friction. For stable, multi-step operations, it's the cleanest way to reduce tool count while preserving capability.

Combining patterns

These four patterns aren't mutually exclusive. Most production setups use two or three together.

A practical combination: decompose your MCP servers by domain (Pattern 1), build facades for complex multi-step workflows within each server (Pattern 4), and run a gateway in front of everything for per-agent access control (Pattern 3). Client-side filtering (Pattern 2) then serves as a safety net for agents that connect directly without going through the gateway.

Support Agent MCP Gateway Analytics Agent Deploy Agent Issues Server8 tools CRM Server6 tools Deploy Facade3 tools Data Server5 tools
MCP tool scoping architecture combining gateway, decomposition, and facade patterns

The gateway handles access control and routing. Each server is focused on one domain. The deploy server uses the facade pattern to collapse workflows. Each agent sees only the tools it needs, typically 8-15 instead of 40+.

Toolset-scoped MCP resolution

The patterns above work for any MCP setup. But if you're building on a platform that manages tools and MCP connections for you, there's a cleaner approach: scope tools at the connection level so filtering happens before the agent ever sees a tool list.

This is how Chanl's MCP server handles it. Instead of connecting an agent to an entire server and then filtering, you create named bundles of tools called toolsets and assign them per agent.

typescript
import { ChanlSDK } from '@chanl/sdk';
 
const chanl = new ChanlSDK({ apiKey: process.env.CHANL_API_KEY });
 
// Create a focused toolset for order management
const orderToolset = await chanl.toolsets.create({
  name: 'Order Management',
  description: 'Tools for looking up and managing customer orders',
});
 
// Add only the tools this toolset needs
for (const toolId of [
  'lookup_order', 'update_order_status', 'process_return',
  'check_inventory', 'calculate_shipping', 'send_confirmation',
]) {
  await chanl.toolsets.addTool(orderToolset.id, toolId);
}
 
// Create a separate toolset for support workflows
const supportToolset = await chanl.toolsets.create({
  name: 'Customer Support',
  description: 'Tools for ticket management and customer communication',
});
 
for (const toolId of [
  'create_ticket', 'escalate_to_human', 'send_followup', 'check_sla_status',
]) {
  await chanl.toolsets.addTool(supportToolset.id, toolId);
}

When you add a new tool to the MCP server, it doesn't automatically appear in every agent's context. You explicitly add it to the toolsets that need it. This is the gateway pattern, but managed through an API instead of a YAML config file.

The resolution works through the MCP connection URL. When an agent connects via /mcp/{workspaceId}/{toolsetId}, the server returns only that toolset's tools. No client-side filtering needed. No gateway configuration. The scoping happens at the URL routing layer.

bash
# Without toolset scoping: agent sees ALL workspace tools
wss://mcp.example.com/mcp/ws_abc123
# → Returns: 45 tools (everything in the workspace)
 
# With toolset scoping: agent sees only its assigned tools  
wss://mcp.example.com/mcp/ws_abc123/ts_order_mgmt
# → Returns: 6 tools (Order Management toolset only)

The priority chain handles edge cases. If an agent connects with a toolset ID, it gets exactly those tools (toolset-scoped resolution). If it connects without a toolset ID, it falls back to agent-scoped resolution, which returns tools explicitly assigned to that agent. This layered approach means you can mix scoped and unscoped agents in the same workspace.

You can also apply per-agent overrides to shared tools. The same search_orders tool might have a different description for the chat agent vs. the analytics agent, configured through client.toolsets.update(id, { toolOverrides }). Same tool, different behavior per toolset, no code changes.

For teams already using scenario-based testing, toolset scoping makes test coverage tractable. Instead of testing an agent against 45 possible tools, you test it against the 6 tools in its assigned toolset. That's a test matrix you can actually complete.

How do you measure whether tool scoping worked?

Track three metrics: token utilization ratio (percentage of context spent on tool definitions), tool selection accuracy (does the model pick the right tool?), and first-call correctness (right tool on the first try, no retries). If all three improve, scoping is working.

After implementing any of these patterns, here's how to confirm the fix:

Token utilization ratio. Measure what percentage of your context window goes to tool definitions before and after scoping. Target: under 10% for most agents. If you're still above 15%, you need tighter scoping or fewer tools per bundle.

Tool selection accuracy. Run controlled tests where you present the agent with ambiguous requests and measure whether it picks the correct tool. Analytics dashboards that track tool call distribution make this easy to spot. Teams typically see accuracy jump from the 65-75% range with 40+ unsorted tools to 85-95% with 8-15 scoped tools. The improvement is consistent across models.

First-call correctness. Track how often the agent calls the right tool on its first attempt, without retries or fallback chains. This metric captures the quality of your tool descriptions within the scoped set. If scoping improved selection but first-call correctness is still low, your tool descriptions need work.

MetricBefore (40+ tools)After (8-15 scoped)
Tokens for tool defs8,000-12,0001,500-3,500
Tool selection accuracy (typical)65-75%85-95%
First-call correctness (typical)~60%~85%
Debug time per tool-related issue30-45 min8-15 min

What to do Monday morning

If your MCP setup has more than 20 tools visible to any single agent, start with the lowest-friction fix that applies to your architecture:

  1. Count your tools. Run listTools on every connected MCP server and add up the totals. If any agent sees more than 20, you have a scoping problem.

  2. Classify by domain. Group tools into functional areas (orders, support, analytics, deployment). This classification drives every pattern below.

  3. Pick your pattern. If you own the MCP servers, decompose them (Pattern 1). If you don't, use client-side filtering (Pattern 2) or a gateway (Pattern 3). If you have multi-step workflows exposed as individual tools, build facades (Pattern 4).

  4. Set a budget. No agent should see more than 15-20 tools. Enforce this as a policy, not a guideline.

  5. Measure before and after. Token utilization ratio and tool selection accuracy are the two numbers that prove the fix is working.

The MCP monolith problem is an architecture problem, not a model problem. Better models won't fix a 120-tool context window. Better scoping will.

Progress0/10
  • Audit total tool count across all connected MCP servers
  • Classify tools by domain (orders, support, analytics, deploy, etc.)
  • Identify tools that are never called in production (candidates for removal)
  • Choose a scoping pattern: decompose, filter, gateway, or facade
  • Set a per-agent tool budget (target: 15 tools max)
  • Implement toolset bundles or gateway allowlists
  • Measure token utilization ratio before and after scoping
  • Run tool selection accuracy tests with ambiguous requests
  • Review tool descriptions for disambiguation and guard clauses
  • Schedule quarterly tool audit to catch new monolith drift

Stop flooding your agents with tools they don't need

Chanl's toolset management lets you create scoped tool bundles per agent, with MCP resolution that returns only what each agent needs. No gateway config. No client-side filtering. Just clean, scoped tool access.

Explore MCP Tool Scoping
DG

Co-founder

Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.

Aprende IA Agéntica

Una lección por semana: técnicas prácticas para construir, probar y lanzar agentes IA. Desde ingeniería de prompts hasta monitoreo en producción. Aprende haciendo.

500+ ingenieros suscritos

Frequently Asked Questions