What Does Stateless Mean in the New MCP 2026 Spec?

The 2026 spec recommends stateless as the default and lets you opt out of protocol-level session management entirely. With sessions disabled, every request is self-contained. Your MCP server receives a tool call, executes it, and returns a result without tracking any per-client state. This makes MCP servers deployable behind any standard load balancer without sticky session requirements.

How Do You Migrate an Existing MCP Server to Stateless Mode?

In the TypeScript SDK, set sessionIdGenerator to undefined when creating your transport. This disables the session middleware entirely. Each request gets a fresh transport instance. If your server needed sessions to track multi-call state, move that state into explicit tool arguments or use the new Tasks extension to pass a task handle between calls.

What Is the MCP Tasks Extension?

Tasks is an async primitive that lets tools return a task handle immediately while the real work runs in the background. Instead of blocking the agent for 30 seconds while a refund processes, your tool returns a taskId in milliseconds. The agent can then poll or subscribe to track progress. Tasks go through defined states: working, input_required, completed, failed, or cancelled.

Why Were Sessions a Production Problem for MCP Servers?

Sessions required sticky routing. Every request from a given client had to land on the same server instance. Behind a Kubernetes deployment or any multi-instance setup, you needed session-affinity configuration, shared session stores, and complex load balancer rules. Any instance restart broke active sessions. Stateless deployment eliminates all of that.

Can I Still Use Application-Level State With Stateless MCP?

Yes. Stateless mode removes protocol-level session tracking, not application-level state. If your tools need to track a multi-step workflow, generate an explicit handle from the first tool call and have the agent pass it as a parameter to subsequent calls. The Tasks extension formalizes exactly this pattern for long-running work.

What Is the MCP 2026 Spec Shipping Date?

The release candidate published in mid-2026 with the final specification dated 2026-07-28. Most major SDK maintainers are already shipping release candidate support, so you can start building with stateless mode and Tasks now using current SDK versions.

How Do MCP Tasks Integrate With CX Agent Workflows?

CX agents frequently trigger work that takes longer than a single tool call round-trip: refund processing, order modifications, data enrichment from slow backends, or multi-step verification flows. Tasks let your agent fire these operations and continue the conversation while the work runs. When a task completes or needs input, the agent is notified without polling loops or connection timeouts.

Should All MCP Servers Switch to Stateless Mode?

Yes, for almost every use case. The MCP spec recommends setting Stateless to true unless you have a specific technical reason to maintain session state. Stateless servers are simpler to deploy, easier to scale, and less likely to fail on instance restarts. The main reason to keep sessions would be a legacy server that relies on in-memory session data and hasn't been updated to use explicit state passing.

How to Migrate Your MCP Server to Stateless Mode

Your MCP server is probably deployed wrong.

It works. But if it's running with sessions enabled (the default in every MCP SDK before the 2026 spec), you've got a hidden operational constraint. Every request from a given client has to land on the same server instance. Sticky sessions. Shared session stores. Load balancer affinity rules. It's the same problem that plagued stateful web servers in 2010, and it's baked into every MCP server that uses the old session model.

The 2026 MCP release candidate fixes this. The fix is simple: turn the session layer off, which the new spec recommends as the default.

This post walks through what changed, why it matters for CX agents, and how to use the new Tasks extension to handle the async work that sessions were often masking.

Why Sessions Were a Production Bottleneck

Session-based MCP servers require sticky routing: every request from a given client must land on the same server instance. That constraint breaks the horizontal scaling model you already use for every other stateless service in your stack.

MCP's original transport model used a persistent SSE (Server-Sent Events) connection with a server-assigned session ID. The client connected, got an Mcp-Session-Id header back, and had to include that ID on every subsequent request. The server kept per-session state in memory.

That model made sense for early MCP deployments where a single Claude Desktop instance talked to a single MCP server on localhost. It falls apart when you deploy to production.

Here's what happens when you put a stateful MCP server behind a Kubernetes deployment with three pods:

text

Client request 1 → Pod A (creates session abc-123)
Client request 2 → Pod B (no session abc-123, returns 400 Session Not Found)

The load balancer doesn't know which pod holds which session. To fix this, you'd need sticky sessions configured at the load balancer level, which means:

Requests from the same client always route to the same pod
If that pod restarts, the session is lost and the client gets an error
You can't freely scale pods up or down without planning session migration
Any deployment that rolls pods creates transient errors for active clients

For a developer tool running locally, this is fine. For a CX agent handling thousands of concurrent customer conversations, it's a reliability and scaling constraint you don't want.

The 2026 spec's answer: make the protocol stateless at the core level.

Stateful vs stateless MCP deployment: session affinity vs any-pod routing

What Changed in the 2026 Spec

The headline change is that the Mcp-Session-Id header is now optional, and the spec recommends running without it. Without it, MCP is just HTTP. Any request can land on any instance, with no server-side memory of what came before.

Three things changed to make this work:

1. Protocol-level sessions removed. No Mcp-Session-Id header. No server-side session store. No DELETE /session teardown. Each HTTP request is independent.

2. SDK session middleware opt-out. In the TypeScript and Python SDKs, sessionIdGenerator: undefined disables the session layer entirely. You don't have to migrate your transport code. Change one config.

3. Tasks extension for explicit handles. The main reason servers needed sessions was to maintain state across multiple calls: "remember that we started a booking in the previous request." The Tasks extension replaces implicit session state with an explicit, durable task handle that the agent holds and passes back.

Migrating to Stateless Mode

Here's a typical MCP server setup before the change:

server-stateful.ts·typescript

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
 
const server = new McpServer({ name: "cx-tools", version: "1.0.0" });
 
// Register tools...
server.tool("get_order_status", { orderId: z.string() }, async ({ orderId }) => {
  const order = await db.orders.findById(orderId);
  return { content: [{ type: "text", text: JSON.stringify(order) }] };
});
 
const transport = new StreamableHTTPServerTransport({
  sessionIdGenerator: () => randomUUID(), // <-- creates per-session state
  onsessioninitialized: (sessionId) => {
    console.log(`Session started: ${sessionId}`);
  },
});

Migration is a one-line change:

server-stateless.ts·typescript

const transport = new StreamableHTTPServerTransport({
  sessionIdGenerator: undefined, // <-- disables session layer entirely
});

That's it. Your server now handles any request independently. Deploy it behind a round-robin load balancer with no affinity rules and it just works.

The tools themselves don't change. get_order_status still receives orderId and returns the result. The difference is the server holds no memory between calls.

If your tools were using session storage to pass data between calls (caching an auth token fetched in one tool for reuse in another, for example), you'll need to refactor those into either explicit tool parameters or the Tasks extension. We'll cover Tasks next.

The Tasks Extension: Async Work Without Blocking

The Tasks extension gives every MCP tool the option to return a task handle immediately and complete its work in the background, rather than blocking the agent for the full duration of a slow operation.

This solves a real production gap that sessions were often masking: some CX tool calls take a long time.

A refund takes 8-30 seconds to process through a payment gateway. An address verification might wait 15 seconds for an external API. A loyalty points recalculation can run for a minute if the customer has a complex transaction history.

With synchronous tool calls, you have two options: block the agent (and the customer conversation) while the work runs, or implement some ad-hoc async hack with polling. Neither is good.

The Tasks extension formalizes async work with a first-class primitive:

Tool receives a request, starts the work, and returns a taskId immediately
Agent holds the taskId and can continue the conversation
When the task completes (or needs input), the server notifies the agent
Agent polls or subscribes to fetch the final result

Here's how to implement a refund tool using Tasks:

refund-tool-with-tasks.ts·typescript

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { z } from "zod";
 
const server = new McpServer({ name: "cx-tools", version: "1.0.0" });
 
// Task store (use Redis or a database in production)
const taskStore = new Map<string, { status: string; result?: unknown }>();
 
server.tool(
  "process_refund",
  {
    orderId: z.string(),
    amount: z.number(),
    reason: z.string(),
  },
  async ({ orderId, amount, reason }) => {
    const taskId = randomUUID();
 
    // Store initial state
    taskStore.set(taskId, { status: "working" });
 
    // Kick off async work
    processRefundAsync(taskId, orderId, amount, reason);
 
    // Return immediately with the task handle
    return {
      content: [
        {
          type: "text",
          text: JSON.stringify({
            taskId,
            status: "working",
            message: `Refund of $${amount} for order ${orderId} is processing. Check back with task ID: ${taskId}`,
          }),
        },
      ],
    };
  }
);
 
server.tool(
  "get_task_status",
  { taskId: z.string() },
  async ({ taskId }) => {
    const task = taskStore.get(taskId);
    if (!task) {
      return { content: [{ type: "text", text: JSON.stringify({ error: "Task not found" }) }] };
    }
    return { content: [{ type: "text", text: JSON.stringify(task) }] };
  }
);
 
async function processRefundAsync(
  taskId: string,
  orderId: string,
  amount: number,
  reason: string
) {
  try {
    const result = await paymentGateway.processRefund({ orderId, amount, reason });
    taskStore.set(taskId, {
      status: "completed",
      result: { refundId: result.id, processedAt: result.timestamp },
    });
  } catch (err) {
    taskStore.set(taskId, {
      status: "failed",
      result: { error: String(err) },
    });
  }
}

The agent's flow with this tool looks like:

Agent calls process_refund, gets back a taskId in ~50ms
Agent tells the customer: "I've submitted your refund. It typically takes 2-3 minutes."
Agent continues the conversation, handles other questions
Agent calls get_task_status when appropriate (after a natural pause, or when the customer asks)
When the task shows completed, agent confirms the refund to the customer

This is dramatically better than holding the customer in silence for 15-30 seconds while a synchronous tool call processes.

Task Lifecycle States

The Tasks extension defines five states a task can be in:

State	Meaning
`working`	Task is running; no result yet
`input_required`	Task needs additional input from the user or agent
`completed`	Task finished successfully; result is available
`failed`	Task encountered an error; error details available
`cancelled`	Task was explicitly cancelled

The input_required state is particularly useful for CX workflows. A refund might pause waiting for the agent to confirm which payment method to credit back. An address update might pause waiting for verification code confirmation. The task signals what it needs, and the agent handles the user interaction to collect it.

input-required-example.ts·typescript

async function processAddressUpdate(taskId: string, customerId: string, newAddress: Address) {
  // Send verification code
  const code = await sms.sendVerification(customerId);
 
  // Signal that we need user input
  taskStore.set(taskId, {
    status: "input_required",
    prompt: "Please ask the customer for the 6-digit verification code sent to their phone.",
    pendingCode: code.hash,
  });
 
  // Wait for input via a separate tool call
  // The agent will call `provide_task_input` with the code
}
 
server.tool(
  "provide_task_input",
  { taskId: z.string(), input: z.string() },
  async ({ taskId, input }) => {
    const task = taskStore.get(taskId);
    if (task?.status !== "input_required") {
      return { content: [{ type: "text", text: "Task is not waiting for input" }] };
    }
    // Verify the input and resume the task
    const valid = verifyCode(input, task.pendingCode);
    if (valid) {
      resumeAddressUpdate(taskId);
    }
    return {
      content: [{ type: "text", text: JSON.stringify({ accepted: valid }) }],
    };
  }
);

This keeps the conversational flow natural. The agent isn't blocked. The customer doesn't wait in silence. The task state is explicit and auditable.

Deploying Your Stateless MCP Server

With sessions gone, your deployment configuration simplifies significantly.

Before:

k8s-stateful.yaml·yaml

spec:
  containers:
    - name: cx-mcp-server
      image: cx-mcp-server:latest
  # Required with stateful MCP
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
              - key: session-group
                operator: In
                values: [cx-mcp]
          topologyKey: "kubernetes.io/hostname"

After:

k8s-stateless.yaml·yaml

spec:
  replicas: 10  # Scale freely
  containers:
    - name: cx-mcp-server
      image: cx-mcp-server:latest
  # No affinity rules needed

Your load balancer configuration drops down to standard round-robin. No sticky sessions. No session-affinity annotations. No session store Redis cluster to manage.

For task storage, you do need a shared store if you're running multiple replicas, since any instance might receive the get_task_status call. Redis works well:

redis-task-store.ts·typescript

import { createClient } from "redis";
 
const redis = createClient({ url: process.env.REDIS_URL });
 
export async function setTaskState(taskId: string, state: TaskState) {
  await redis.setEx(`task:${taskId}`, 3600, JSON.stringify(state)); // 1hr TTL
}
 
export async function getTaskState(taskId: string): Promise<TaskState | null> {
  const raw = await redis.get(`task:${taskId}`);
  return raw ? JSON.parse(raw) : null;
}

Task storage is simpler than session storage because tasks are write-once-then-update (you never need to invalidate or migrate them across instances) and they have a natural TTL once completed.

mcp-stream

[09:41:12]connectionTransport: SSE | Status: Connected

[09:41:13]tool_list12 tools registered

[09:41:14]tool_callmemory.search({ query: 'billing...' })

[09:41:15]tool_result{ matches: 3, relevance: 0.94 }

[09:41:16]tool_callknowledge.query({ topic: 'refund...' })

[09:41:17]tool_result{ documents: 2, confidence: 0.91 }

[09:41:18]heartbeatlatency: 12ms

Observing Tasks in Production

Long-running tasks need monitoring. You want to know when tasks fail silently, when they take unexpectedly long, and whether specific task types have higher failure rates.

A simple pattern: emit a structured log event on every task state transition.

task-telemetry.ts·typescript

function updateTask(taskId: string, state: TaskState) {
  const prev = taskStore.get(taskId);
  taskStore.set(taskId, state);
 
  console.log(
    JSON.stringify({
      event: "task_state_change",
      taskId,
      from: prev?.status,
      to: state.status,
      tool: state.toolName,
      duration_ms: Date.now() - state.startedAt,
    })
  );
}

With a monitoring stack watching those logs, you can build dashboards for task completion rates, median completion time per tool, and failure rates. This is the same data that surfaces in Chanl's monitoring when you route your agent through the platform. Every task start, state change, and completion shows up in the conversation trace.

Stale tasks are the failure mode to watch most carefully. A task stuck in working for longer than its expected max duration usually means a downstream service hung. Set a maximum duration per task type and alert when tasks age past it:

task-staleness-check.ts·typescript

const MAX_TASK_DURATION_MS: Record<string, number> = {
  process_refund: 60_000,        // 1 minute
  update_address: 120_000,       // 2 minutes
  cancel_subscription: 30_000,   // 30 seconds
};
 
// Run periodically
async function checkStaleTasks() {
  for (const [taskId, task] of taskStore.entries()) {
    if (task.status !== "working") continue;
    const age = Date.now() - task.startedAt;
    const maxAge = MAX_TASK_DURATION_MS[task.toolName] ?? 60_000;
    if (age > maxAge) {
      updateTask(taskId, {
        ...task,
        status: "failed",
        result: { error: `Task exceeded max duration of ${maxAge}ms` },
      });
      // Alert your on-call
      alerting.fire(`Task ${taskId} (${task.toolName}) went stale after ${age}ms`);
    }
  }
}

What This Means for Your Agent Architecture

Your MCP server deployment should look like any other stateless microservice. The 2026 spec makes that possible and the SDKs already support it, so you don't need to wait for the July 28 official stamp.

The migration is genuinely low-effort for most servers: one config change to disable sessions, a quick audit of any tools that were implicitly relying on cross-call session state, and a decision about which long-running operations benefit from the Tasks extension.

The payoff is an MCP server that deploys like any other stateless service. Kubernetes horizontal pod autoscaling works without special affinity rules. Serverless deployment (AWS Lambda, Google Cloud Functions) becomes viable since there's no persistent connection to maintain. Rolling deployments don't break active calls. Load balancer configuration is boring, which is exactly what you want from infrastructure.

The Tasks extension solves a real CX problem that sessions were never designed to address. Async work like refunds, verifications, and multi-step updates can now happen without blocking the conversation or requiring the agent to hold a connection open.

If you're using Chanl to manage your agent's tool layer and MCP server configuration, we'll surface Tasks alongside synchronous tool calls in the conversation trace, so you can see exactly which async operations were triggered in any customer interaction and how they resolved.

The infrastructure part of building CX agents should be boring. Stateless MCP gets you closer to that.

Build production-ready MCP tools without the infrastructure headache

Chanl handles MCP server configuration, tool monitoring, and conversation tracing so you can focus on what your tools actually do.

Start building

Key Takeaway

Testing edge cases before production deployment can reduce customer complaints by 80% and prevent costly emergency fixes post-launch.

mcp stateless scaling tasks agent-infrastructure typescript production

Dean Grover

Co-founder

Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.

The Signal Briefing

Un email por semana. Cómo los equipos líderes de CS, ingresos e IA están convirtiendo conversaciones en decisiones. Benchmarks, playbooks y lo que funciona en producción.