What is MCP sampling and how does it work?

MCP sampling lets a server send a prompt to the LLM through the connected client, without needing its own API key. The server calls sampling/createMessage with a messages array and max tokens. The client controls which model handles it, but the server drives the request and gets the result. It's useful for AI-assisted routing, content extraction, and validation within tool execution, without adding complexity to the main agent prompt.

What is MCP elicitation and when should I use it?

MCP elicitation lets a server pause mid-execution and ask the user for structured, schema-validated input through the client's UI. Unlike prompting the LLM to ask the user something, elicitation produces validated JSON the server can consume directly with no parsing needed. Use it for destructive-action confirmation, collecting structured parameters the agent doesn't have, and step-up authorization flows.

What are MCP Roots and why do they matter?

Roots are a list of filesystem paths the client sends to the server during initialization, telling the server which directories it has access to. Without roots, the server either assumes paths or asks the LLM to figure them out, and both are fragile. Roots let the server scope file operations accurately and listen for workspace changes as the user opens or closes directories.

Is sampling the same as giving the MCP server its own API key?

No. Sampling routes through the client's model connection. The server doesn't get credentials or direct API access. The client controls which model handles the sampling request, can apply its own policies and rate limits, and can refuse specific requests. This is intentional: it keeps model access governance in the client, not distributed across every server a user connects to.

Can I use MCP elicitation in voice agents or chat interfaces?

Yes, but the client needs to support elicitation. Most desktop clients (Claude Code, Cursor, some IDEs) support it. Custom chat UIs need to implement the elicitation handler and render the schema as a form or structured prompt. Voice interfaces typically can't render a JSON Schema form, so for voice CX agents you'd use a different human-in-the-loop pattern like an async approval webhook.

How do I test agents that use sampling and elicitation?

Unit tests should mock the extra.session.client.request method, returning realistic sampling responses and elicitation inputs for different branches. Integration tests should run a real MCP client and simulate user inputs at the elicitation step. Scenario testing, where you define a full conversation including elicitation responses, is the best way to validate the end-to-end agent path under realistic conditions.

What's the latency cost of MCP sampling?

Each sampling call adds one complete LLM round trip, typically 200 to 800ms depending on the model and prompt size. For routing and classification, the modelPreferences speedPriority flag lets you prefer faster, cheaper models. Don't use sampling in the critical real-time path of a voice agent. It works well for async validation, pre-processing, and decisions that happen before the user is waiting for a response.

Which MCP clients support sampling and elicitation today?

As of May 2026, Claude Code and Cursor support sampling. Elicitation support is newer. Claude Code supports it in recent versions, and most major IDE-based clients have it on their roadmap. When building a custom MCP client, you declare sampling and elicitation capabilities during the MCP handshake, and your server can check whether the client supports them before attempting a call.

How to Use MCP Sampling, Roots, and Elicitation in CX Agents

You shipped your MCP server. Tools work. Resources work. Prompts work. You tested it against Claude Code, everything ran, and you called it done.

Here's what you probably missed. MCP has three capabilities on the client side that your server can call. Most tutorials stop at the server side (tools, resources, prompts) because that's where 90% of the protocol's surface area lives. But the client-side features are where some of the most useful patterns live: AI-assisted routing without API keys, structured user input that doesn't require a multi-turn conversation loop, and filesystem context that keeps your file operations from guessing.

These aren't obscure edge cases. If you're building a CX agent with 20+ tools, the sampling-based router will make it more reliable. If you need confirmation before a destructive action, elicitation is cleaner than anything you'd build in the LLM's main conversation flow.

Let me show you all three.

The MCP Feature Hierarchy Most Builders Miss

MCP is a two-sided protocol. Servers expose three capability types to clients: Tools (callable functions), Resources (content the model can read), and Prompts (reusable prompt templates). You've almost certainly worked with all three.

Clients expose three capability types to servers: Sampling (server requests an LLM completion through the client), Roots (client tells the server which filesystem paths it has access to), and Elicitation (server requests structured input from the user through the client's UI).

Most builders know the left side of this model. The right side, what clients expose to servers, is what unlocks genuinely new patterns.

MCP's two-sided capability model: server features and client features

Before your server can use any of these, it needs to declare that it wants them and verify the client supports them. The capability negotiation happens during the MCP handshake:

server-capability-declaration.ts·typescript

import { Server } from "@modelcontextprotocol/sdk/server/index.js";
 
const server = new Server(
  { name: "support-server", version: "1.0.0" },
  {
    capabilities: {
      tools: {},
      // Tell the client we want to use sampling and elicitation
      sampling: {},
      elicitation: {}
    }
  }
);

If the client doesn't support one of these, calls to it will fail, so check the client's declared capabilities before using them in a critical path.

What Is MCP Sampling and When Does Your Server Need It?

Sampling lets an MCP server send a prompt to the language model through the client, without needing its own API key. The server drives the request (what to ask, how many tokens to use, which model to prefer) but the client controls which model actually runs it. This is intentional. It keeps model access governance in the client, not distributed across every server a user connects to.

The canonical use case is intent routing. Imagine you're building a customer support MCP server with 25 tools covering orders, billing, shipping, returns, and product questions. The agent's main context window is already carrying customer history, system instructions, and conversation turns. You don't want to add a 500-token tool routing section to every prompt.

Instead, route with sampling. When a user message comes in, ask a small, fast model to classify it. The main agent just calls your dispatch tool; the dispatch tool handles routing internally with a sampling call.

sampling-intent-router.ts·typescript

import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { CallToolRequestSchema, CreateMessageResultSchema } from "@modelcontextprotocol/sdk/types.js";
 
const server = new Server(
  { name: "support-server", version: "1.0.0" },
  { capabilities: { tools: {}, sampling: {} } }
);
 
server.setRequestHandler(CallToolRequestSchema, async (request, extra) => {
  if (request.params.name !== "handle_customer_request") {
    return await dispatchToTool(request.params.name, request.params.arguments);
  }
 
  const userQuery = request.params.arguments?.query as string;
 
  // Use sampling to classify intent. No API key needed in the server.
  const routingResult = await extra.session.client.request(
    {
      method: "sampling/createMessage",
      params: {
        messages: [{
          role: "user",
          content: {
            type: "text",
            text: `Classify this customer query into exactly one category:\n\nORDER_STATUS: asking about an order's location or delivery date\nREFUND_REQUEST: asking for a refund, return, or exchange\nBILLING_ISSUE: payment, invoice, or charge question\nPRODUCT_QUESTION: features, availability, or compatibility\nACCOUNT_ISSUE: password, login, or account access\nOTHER: anything else\n\nCustomer query: "${userQuery}"\n\nReply with only the category name. No explanation.`
          }
        }],
        modelPreferences: {
          hints: [{ name: "claude-haiku" }],
          speedPriority: 0.9,
          intelligencePriority: 0.2,
          costPriority: 0.8
        },
        maxTokens: 25,
        systemPrompt: "You are a request classifier. Reply with only the category name."
      }
    },
    CreateMessageResultSchema
  );
 
  const intent = routingResult.content.type === "text"
    ? routingResult.content.text.trim()
    : "OTHER";
 
  switch (intent) {
    case "ORDER_STATUS":     return await checkOrderStatus(userQuery);
    case "REFUND_REQUEST":   return await initiateRefund(userQuery);
    case "BILLING_ISSUE":    return await lookupBilling(userQuery);
    case "PRODUCT_QUESTION": return await searchProductKnowledge(userQuery);
    case "ACCOUNT_ISSUE":    return await handleAccountIssue(userQuery);
    default:                 return await generalSupport(userQuery);
  }
});

This keeps routing logic out of the main agent prompt, uses a cheap fast model for classification, and makes your tool set easier to maintain. When you add a new tool category, you update the classifier; you don't touch the main agent's system prompt.

When Does Sampling Fit, and When Does It Hurt?

Sampling adds a complete LLM round trip to the tool's execution path, typically 200 to 800ms depending on the model and prompt. That cost is worth it for classification, validation, and summarization. It's not worth it for decisions that need to complete in under 100ms.

Good uses:

Intent classification before dispatch (as above)
Extracting structured data from unstructured tool output before storing it to memory
Post-call summarization. Run a sampling call to compress a long transcript before storing.
Validation before destructive actions: "Does this refund request look legitimate, or is the order ID malformed?"

Not good uses:

Generating the actual agent response (the main client LLM handles that)
Real-time decisions in a voice pipeline's critical path (sub-300ms voice response requirements leave no room for a sampling call mid-response)
Anything the main LLM can route in-context with a clear tool description

mcp-stream

[09:41:12]connectionTransport: SSE | Status: Connected

[09:41:13]tool_list12 tools registered

[09:41:14]tool_callmemory.search({ query: 'billing...' })

[09:41:15]tool_result{ matches: 3, relevance: 0.94 }

[09:41:16]tool_callknowledge.query({ topic: 'refund...' })

[09:41:17]tool_result{ documents: 2, confidence: 0.91 }

[09:41:18]heartbeatlatency: 12ms

How Does MCP Elicitation Replace the Multi-Turn Loop?

Elicitation lets a server pause mid-execution and request structured, schema-validated input from the user through the client's UI. The client renders the schema as a form (or a structured prompt in chat interfaces). The server gets back clean JSON that matches the schema. No parsing. No error handling for ambiguous free-form answers.

This sounds simple, but it solves a real problem in CX agent design.

Think about a cancellation flow. Without elicitation, you'd write a multi-turn conversation loop, hope the LLM correctly parses "credit" vs. "store credit" vs. "my card," and handle all the edge cases where users give ambiguous answers. The LLM stores its half-formed interpretation in the conversation state, and you're debugging prompt issues when the user says "the third option" and the agent picks the wrong one.

With elicitation, you define a JSON schema. The client presents it. The server receives validated, typed data. That's the whole loop.

elicitation-cancellation-flow.ts·typescript

import { CallToolRequestSchema } from "@modelcontextprotocol/sdk/types.js";
 
server.setRequestHandler(CallToolRequestSchema, async (request, extra) => {
  if (request.params.name !== "cancel_order") {
    return dispatchToTool(request.params.name, request.params.arguments);
  }
 
  const orderId = request.params.arguments?.orderId as string;
  const order = await getOrder(orderId);
 
  if (!order) {
    return { content: [{ type: "text", text: `Order ${orderId} not found.` }] };
  }
 
  const confirmation = await extra.session.client.request({
    method: "elicitation/create",
    params: {
      message: `You're about to cancel order #${orderId}: ${order.items.length} item(s), $${order.total}. This can't be undone.`,
      requestedSchema: {
        type: "object",
        properties: {
          confirm: {
            type: "boolean",
            title: "Confirm cancellation",
            description: "Check to confirm you want to proceed"
          },
          reason: {
            type: "string",
            title: "Reason for cancellation",
            enum: ["changed_mind", "found_better_price", "delivery_too_slow", "wrong_item", "other"],
            enumNames: ["Changed my mind", "Found a better price", "Delivery is too slow", "Ordered the wrong item", "Other reason"]
          },
          refundMethod: {
            type: "string",
            title: "Refund to",
            enum: ["original_payment", "store_credit"],
            enumNames: ["Original payment method (3-5 days)", "Store credit (instant, +5% bonus)"]
          }
        },
        required: ["confirm", "reason", "refundMethod"]
      }
    }
  });
 
  if (confirmation.action !== "accept" || !confirmation.content?.confirm) {
    return { content: [{ type: "text", text: "Cancellation aborted. Your order is still active." }] };
  }
 
  const { reason, refundMethod } = confirmation.content;
  await processOrderCancellation(orderId, reason, refundMethod);
 
  const refundDescription = refundMethod === "store_credit"
    ? "store credit (available instantly)"
    : "original payment method (3-5 business days)";
 
  return {
    content: [{
      type: "text",
      text: `Order #${orderId} cancelled. Refund to ${refundDescription} processed.`
    }]
  };
});

The confirmation.action field can be "accept", "decline", or "cancel". Always handle all three. If a user closes the form without responding ("cancel"), treat it as an abort. Don't retry the elicitation automatically.

Four Human-in-the-Loop Patterns, and When to Use Each

Elicitation is one of four distinct HITL patterns for agent workflows. AWS documented all four in an early 2026 reference implementation. Knowing when to use each saves you from picking the right tool for the wrong use case.

Hook System: intercept tool calls before execution and inject an approval step. You add a middleware layer to your tool dispatch that pauses on high-risk tools (delete, update, send) and routes approval to a human operator. Good for ops workflows where a supervisor approves AI actions, not suitable for real-time user-facing CX.

Tool Context: annotate tools with risk levels and handle elevated-risk tools differently in the orchestration loop. This is prompt-based. You tell the agent "confirm with the user before calling any tool marked risk: high." Flexible but depends on the LLM following your instructions correctly.

Step Functions / Async Approval: route agent execution through an external state machine that includes human review steps. A manager, QA reviewer, or compliance officer gets a notification, reviews, and approves before the agent continues. Full audit trail, built for async workflows, not suitable for interactive sessions.

MCP Elicitation: protocol-native. The server pauses, the client renders a form, the user inputs, the server continues. No external infrastructure. Works in the same interactive session. Best for asking the user (not an operator) to confirm or provide information.

For CX agents, elicitation covers most of your real-time HITL needs. Use Step Functions or Hook System when a human other than the user needs to approve an action.

Why Do MCP Roots Matter for File Operations?

Roots tells your MCP server which filesystem paths the connected client has access to. During the MCP handshake, the client sends a list of these paths (its "roots") and your server receives them, so it can scope file operations to the user's actual workspace rather than guessing or asking the LLM to figure it out.

Without roots, your server has two bad options: assume paths (and break when the user's project is somewhere else) or ask the LLM to figure them out (fragile and adds prompt noise).

roots-handler.ts·typescript

import {
  RootsListChangedNotificationSchema,
  ListRootsResultSchema
} from "@modelcontextprotocol/sdk/types.js";
 
let workspaceRoots: Array<{ uri: string; name?: string }> = [];
 
async function initializeRoots(extra: RequestHandlerExtra) {
  const result = await extra.session.client.request(
    { method: "roots/list" },
    ListRootsResultSchema
  );
  workspaceRoots = result.roots;
}
 
server.setNotificationHandler(
  RootsListChangedNotificationSchema,
  async (notification, extra) => {
    const result = await extra.session.client.request(
      { method: "roots/list" },
      ListRootsResultSchema
    );
    workspaceRoots = result.roots;
    await refreshFileScopeCache(workspaceRoots);
  }
);
 
server.setRequestHandler(CallToolRequestSchema, async (request, extra) => {
  if (request.params.name === "read_customer_document") {
    const relativePath = request.params.arguments?.path as string;
 
    const matchingRoot = workspaceRoots.find(root =>
      relativePath.startsWith(root.uri.replace("file://", ""))
    );
 
    if (!matchingRoot) {
      return {
        content: [{ type: "text", text: `Path ${relativePath} is outside the client's workspace roots.` }],
        isError: true
      };
    }
 
    return await readFile(relativePath);
  }
});

For CX agents, roots matters when your agent needs to process customer-uploaded files, access case attachments, or read from a local knowledge base the support rep has open. It's also a useful security boundary. Before reading any file path from a tool argument, verify it falls within the declared roots.

Testing Agents That Use Sampling and Elicitation

Sampling and elicitation add asynchronous, external-dependency steps to your tool execution path. Standard unit tests won't catch failures here because extra.session.client.request doesn't exist in a test environment. You need to mock it explicitly.

mock-mcp-client-testing.ts·typescript

import { vi, describe, it, expect } from "vitest";
import { handleCustomerRequest } from "./support-server.js";
 
describe("customer request handler", () => {
  const mockRequest = vi.fn();
  const mockExtra = { session: { client: { request: mockRequest } } };
 
  it("routes ORDER_STATUS intent correctly", async () => {
    mockRequest.mockResolvedValueOnce({
      content: { type: "text", text: "ORDER_STATUS" }
    });
 
    const result = await handleCustomerRequest(
      { params: { name: "handle_customer_request", arguments: { query: "Where is my package?" } } },
      mockExtra
    );
 
    expect(mockRequest).toHaveBeenCalledWith(
      expect.objectContaining({ method: "sampling/createMessage" }),
      expect.anything()
    );
    expect(result.content[0].text).toContain("order");
  });
 
  it("aborts cancellation when user declines elicitation", async () => {
    mockRequest.mockResolvedValueOnce({ action: "decline", content: null });
 
    const result = await handleCustomerRequest(
      { params: { name: "cancel_order", arguments: { orderId: "12345" } } },
      mockExtra
    );
 
    expect(result.content[0].text).toContain("aborted");
  });
 
  it("handles cancel action from elicitation form", async () => {
    mockRequest.mockResolvedValueOnce({ action: "cancel", content: null });
 
    const result = await handleCustomerRequest(
      { params: { name: "cancel_order", arguments: { orderId: "12345" } } },
      mockExtra
    );
 
    expect(result.content[0].text).toContain("aborted");
  });
});

For integration testing and pre-production validation, scenario testing lets you define full conversation flows including elicitation responses. You set the expected elicitation input for each scenario branch and verify the agent takes the right path. Useful for regression testing before you deploy changes to your MCP server.

One thing worth knowing: you can check whether the connected client supports sampling or elicitation before trying to use them, using extra.session.clientCapabilities. If a client connects that doesn't support elicitation (an older integration, a custom client, a voice interface), you can fall back gracefully instead of throwing.

capability-check.ts·typescript

server.setRequestHandler(CallToolRequestSchema, async (request, extra) => {
  if (request.params.name === "cancel_order") {
    const supportsElicitation =
      extra.session.clientCapabilities?.elicitation !== undefined;
 
    if (!supportsElicitation) {
      return {
        content: [{
          type: "text",
          text: "Please ask the user to confirm they want to cancel order #" +
                request.params.arguments?.orderId +
                " before calling this tool again with confirmed: true."
        }]
      };
    }
 
    return await cancelWithElicitation(request.params.arguments?.orderId, extra);
  }
});

Putting It Together

The three client capabilities form a natural hierarchy. Roots gives your server situational awareness. It knows what workspace it's in. Sampling gives your server cognitive assistance. It can ask the LLM to help with classification, extraction, or validation. Elicitation gives your server user collaboration. It can pause and ask the user for input in a structured way.

Most MCP servers need at most one or two of these. The sampling-based intent router is valuable for any server with more than a handful of agent tools. Elicitation is worth adding to any flow that includes a destructive or irreversible action. Roots matters whenever your server works with files.

None of these are advanced features. They're part of the base protocol, available in the official SDK, and documented in the MCP spec. They're just in the section most tutorials skip. Once you're using them, Chanl's MCP monitoring can track which sampling calls fire, which elicitation flows reach the user, and where tools are getting misrouted.

If you're building a production CX agent and you haven't looked at the client capabilities yet, start with the MCP basics guide if you need a foundation, or jump straight into the advanced MCP patterns article if you've already shipped a server and want to go deeper.

The builders who understand both sides of the MCP protocol end up with agents that route more accurately, confirm before acting, and break far less often when tool lists grow.

Connect your MCP server to Chanl's tool monitoring

Track tool call accuracy, catch routing failures, and run scenario tests against your MCP server's sampling and elicitation flows before they reach production.

Try Chanl Free

Key Takeaway

Testing edge cases before production deployment can reduce customer complaints by 80% and prevent costly emergency fixes post-launch.

mcp sampling elicitation human-in-the-loop agent-patterns tool-calling

Dean Grover

Co-founder

Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.

Learn Agentic AI

Weekly. Patterns for shipping agents that work — MCP, scorecards, regression tests, prompts, model comparisons.

500+ builders subscribed