Your MCP server is probably deployed wrong.
It works. But if it's running with sessions enabled (the default in every MCP SDK before the 2026 spec), you've got a hidden operational constraint. Every request from a given client has to land on the same server instance. Sticky sessions. Shared session stores. Load balancer affinity rules. It's the same problem that plagued stateful web servers in 2010, and it's baked into every MCP server that uses the old session model.
The 2026 MCP release candidate fixes this. The fix is simple: turn the session layer off, which the new spec recommends as the default.
This post walks through what changed, why it matters for CX agents, and how to use the new Tasks extension to handle the async work that sessions were often masking.
Why Sessions Were a Production Bottleneck
Session-based MCP servers require sticky routing: every request from a given client must land on the same server instance. That constraint breaks the horizontal scaling model you already use for every other stateless service in your stack.
MCP's original transport model used a persistent SSE (Server-Sent Events) connection with a server-assigned session ID. The client connected, got an Mcp-Session-Id header back, and had to include that ID on every subsequent request. The server kept per-session state in memory.
That model made sense for early MCP deployments where a single Claude Desktop instance talked to a single MCP server on localhost. It falls apart when you deploy to production.
Here's what happens when you put a stateful MCP server behind a Kubernetes deployment with three pods:
Client request 1 → Pod A (creates session abc-123)
Client request 2 → Pod B (no session abc-123, returns 400 Session Not Found)The load balancer doesn't know which pod holds which session. To fix this, you'd need sticky sessions configured at the load balancer level, which means:
- Requests from the same client always route to the same pod
- If that pod restarts, the session is lost and the client gets an error
- You can't freely scale pods up or down without planning session migration
- Any deployment that rolls pods creates transient errors for active clients
For a developer tool running locally, this is fine. For a CX agent handling thousands of concurrent customer conversations, it's a reliability and scaling constraint you don't want.
The 2026 spec's answer: make the protocol stateless at the core level.
What Changed in the 2026 Spec
The headline change is that the Mcp-Session-Id header is now optional, and the spec recommends running without it. Without it, MCP is just HTTP. Any request can land on any instance, with no server-side memory of what came before.
Three things changed to make this work:
1. Protocol-level sessions removed. No Mcp-Session-Id header. No server-side session store. No DELETE /session teardown. Each HTTP request is independent.
2. SDK session middleware opt-out. In the TypeScript and Python SDKs, sessionIdGenerator: undefined disables the session layer entirely. You don't have to migrate your transport code. Change one config.
3. Tasks extension for explicit handles. The main reason servers needed sessions was to maintain state across multiple calls: "remember that we started a booking in the previous request." The Tasks extension replaces implicit session state with an explicit, durable task handle that the agent holds and passes back.
Migrating to Stateless Mode
Here's a typical MCP server setup before the change:
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
const server = new McpServer({ name: "cx-tools", version: "1.0.0" });
// Register tools...
server.tool("get_order_status", { orderId: z.string() }, async ({ orderId }) => {
const order = await db.orders.findById(orderId);
return { content: [{ type: "text", text: JSON.stringify(order) }] };
});
const transport = new StreamableHTTPServerTransport({
sessionIdGenerator: () => randomUUID(), // <-- creates per-session state
onsessioninitialized: (sessionId) => {
console.log(`Session started: ${sessionId}`);
},
});Migration is a one-line change:
const transport = new StreamableHTTPServerTransport({
sessionIdGenerator: undefined, // <-- disables session layer entirely
});That's it. Your server now handles any request independently. Deploy it behind a round-robin load balancer with no affinity rules and it just works.
The tools themselves don't change. get_order_status still receives orderId and returns the result. The difference is the server holds no memory between calls.
If your tools were using session storage to pass data between calls (caching an auth token fetched in one tool for reuse in another, for example), you'll need to refactor those into either explicit tool parameters or the Tasks extension. We'll cover Tasks next.
The Tasks Extension: Async Work Without Blocking
The Tasks extension gives every MCP tool the option to return a task handle immediately and complete its work in the background, rather than blocking the agent for the full duration of a slow operation.
This solves a real production gap that sessions were often masking: some CX tool calls take a long time.
A refund takes 8-30 seconds to process through a payment gateway. An address verification might wait 15 seconds for an external API. A loyalty points recalculation can run for a minute if the customer has a complex transaction history.
With synchronous tool calls, you have two options: block the agent (and the customer conversation) while the work runs, or implement some ad-hoc async hack with polling. Neither is good.
The Tasks extension formalizes async work with a first-class primitive:
- Tool receives a request, starts the work, and returns a
taskIdimmediately - Agent holds the
taskIdand can continue the conversation - When the task completes (or needs input), the server notifies the agent
- Agent polls or subscribes to fetch the final result
Here's how to implement a refund tool using Tasks:
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { z } from "zod";
const server = new McpServer({ name: "cx-tools", version: "1.0.0" });
// Task store (use Redis or a database in production)
const taskStore = new Map<string, { status: string; result?: unknown }>();
server.tool(
"process_refund",
{
orderId: z.string(),
amount: z.number(),
reason: z.string(),
},
async ({ orderId, amount, reason }) => {
const taskId = randomUUID();
// Store initial state
taskStore.set(taskId, { status: "working" });
// Kick off async work
processRefundAsync(taskId, orderId, amount, reason);
// Return immediately with the task handle
return {
content: [
{
type: "text",
text: JSON.stringify({
taskId,
status: "working",
message: `Refund of $${amount} for order ${orderId} is processing. Check back with task ID: ${taskId}`,
}),
},
],
};
}
);
server.tool(
"get_task_status",
{ taskId: z.string() },
async ({ taskId }) => {
const task = taskStore.get(taskId);
if (!task) {
return { content: [{ type: "text", text: JSON.stringify({ error: "Task not found" }) }] };
}
return { content: [{ type: "text", text: JSON.stringify(task) }] };
}
);
async function processRefundAsync(
taskId: string,
orderId: string,
amount: number,
reason: string
) {
try {
const result = await paymentGateway.processRefund({ orderId, amount, reason });
taskStore.set(taskId, {
status: "completed",
result: { refundId: result.id, processedAt: result.timestamp },
});
} catch (err) {
taskStore.set(taskId, {
status: "failed",
result: { error: String(err) },
});
}
}The agent's flow with this tool looks like:
- Agent calls
process_refund, gets back ataskIdin ~50ms - Agent tells the customer: "I've submitted your refund. It typically takes 2-3 minutes."
- Agent continues the conversation, handles other questions
- Agent calls
get_task_statuswhen appropriate (after a natural pause, or when the customer asks) - When the task shows
completed, agent confirms the refund to the customer
This is dramatically better than holding the customer in silence for 15-30 seconds while a synchronous tool call processes.
Task Lifecycle States
The Tasks extension defines five states a task can be in:
| State | Meaning |
|---|---|
working | Task is running; no result yet |
input_required | Task needs additional input from the user or agent |
completed | Task finished successfully; result is available |
failed | Task encountered an error; error details available |
cancelled | Task was explicitly cancelled |
The input_required state is particularly useful for CX workflows. A refund might pause waiting for the agent to confirm which payment method to credit back. An address update might pause waiting for verification code confirmation. The task signals what it needs, and the agent handles the user interaction to collect it.
async function processAddressUpdate(taskId: string, customerId: string, newAddress: Address) {
// Send verification code
const code = await sms.sendVerification(customerId);
// Signal that we need user input
taskStore.set(taskId, {
status: "input_required",
prompt: "Please ask the customer for the 6-digit verification code sent to their phone.",
pendingCode: code.hash,
});
// Wait for input via a separate tool call
// The agent will call `provide_task_input` with the code
}
server.tool(
"provide_task_input",
{ taskId: z.string(), input: z.string() },
async ({ taskId, input }) => {
const task = taskStore.get(taskId);
if (task?.status !== "input_required") {
return { content: [{ type: "text", text: "Task is not waiting for input" }] };
}
// Verify the input and resume the task
const valid = verifyCode(input, task.pendingCode);
if (valid) {
resumeAddressUpdate(taskId);
}
return {
content: [{ type: "text", text: JSON.stringify({ accepted: valid }) }],
};
}
);This keeps the conversational flow natural. The agent isn't blocked. The customer doesn't wait in silence. The task state is explicit and auditable.
Deploying Your Stateless MCP Server
With sessions gone, your deployment configuration simplifies significantly.
Before:
spec:
containers:
- name: cx-mcp-server
image: cx-mcp-server:latest
# Required with stateful MCP
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: session-group
operator: In
values: [cx-mcp]
topologyKey: "kubernetes.io/hostname"After:
spec:
replicas: 10 # Scale freely
containers:
- name: cx-mcp-server
image: cx-mcp-server:latest
# No affinity rules neededYour load balancer configuration drops down to standard round-robin. No sticky sessions. No session-affinity annotations. No session store Redis cluster to manage.
For task storage, you do need a shared store if you're running multiple replicas, since any instance might receive the get_task_status call. Redis works well:
import { createClient } from "redis";
const redis = createClient({ url: process.env.REDIS_URL });
export async function setTaskState(taskId: string, state: TaskState) {
await redis.setEx(`task:${taskId}`, 3600, JSON.stringify(state)); // 1hr TTL
}
export async function getTaskState(taskId: string): Promise<TaskState | null> {
const raw = await redis.get(`task:${taskId}`);
return raw ? JSON.parse(raw) : null;
}Task storage is simpler than session storage because tasks are write-once-then-update (you never need to invalidate or migrate them across instances) and they have a natural TTL once completed.
Observing Tasks in Production
Long-running tasks need monitoring. You want to know when tasks fail silently, when they take unexpectedly long, and whether specific task types have higher failure rates.
A simple pattern: emit a structured log event on every task state transition.
function updateTask(taskId: string, state: TaskState) {
const prev = taskStore.get(taskId);
taskStore.set(taskId, state);
console.log(
JSON.stringify({
event: "task_state_change",
taskId,
from: prev?.status,
to: state.status,
tool: state.toolName,
duration_ms: Date.now() - state.startedAt,
})
);
}With a monitoring stack watching those logs, you can build dashboards for task completion rates, median completion time per tool, and failure rates. This is the same data that surfaces in Chanl's monitoring when you route your agent through the platform. Every task start, state change, and completion shows up in the conversation trace.
Stale tasks are the failure mode to watch most carefully. A task stuck in working for longer than its expected max duration usually means a downstream service hung. Set a maximum duration per task type and alert when tasks age past it:
const MAX_TASK_DURATION_MS: Record<string, number> = {
process_refund: 60_000, // 1 minute
update_address: 120_000, // 2 minutes
cancel_subscription: 30_000, // 30 seconds
};
// Run periodically
async function checkStaleTasks() {
for (const [taskId, task] of taskStore.entries()) {
if (task.status !== "working") continue;
const age = Date.now() - task.startedAt;
const maxAge = MAX_TASK_DURATION_MS[task.toolName] ?? 60_000;
if (age > maxAge) {
updateTask(taskId, {
...task,
status: "failed",
result: { error: `Task exceeded max duration of ${maxAge}ms` },
});
// Alert your on-call
alerting.fire(`Task ${taskId} (${task.toolName}) went stale after ${age}ms`);
}
}
}What This Means for Your Agent Architecture
Your MCP server deployment should look like any other stateless microservice. The 2026 spec makes that possible and the SDKs already support it, so you don't need to wait for the July 28 official stamp.
The migration is genuinely low-effort for most servers: one config change to disable sessions, a quick audit of any tools that were implicitly relying on cross-call session state, and a decision about which long-running operations benefit from the Tasks extension.
The payoff is an MCP server that deploys like any other stateless service. Kubernetes horizontal pod autoscaling works without special affinity rules. Serverless deployment (AWS Lambda, Google Cloud Functions) becomes viable since there's no persistent connection to maintain. Rolling deployments don't break active calls. Load balancer configuration is boring, which is exactly what you want from infrastructure.
The Tasks extension solves a real CX problem that sessions were never designed to address. Async work like refunds, verifications, and multi-step updates can now happen without blocking the conversation or requiring the agent to hold a connection open.
If you're using Chanl to manage your agent's tool layer and MCP server configuration, we'll surface Tasks alongside synchronous tool calls in the conversation trace, so you can see exactly which async operations were triggered in any customer interaction and how they resolved.
The infrastructure part of building CX agents should be boring. Stateless MCP gets you closer to that.
Build production-ready MCP tools without the infrastructure headache
Chanl handles MCP server configuration, tool monitoring, and conversation tracing so you can focus on what your tools actually do.
Start buildingCo-founder
Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.
The Signal Briefing
Un email por semana. Cómo los equipos líderes de CS, ingresos e IA están convirtiendo conversaciones en decisiones. Benchmarks, playbooks y lo que funciona en producción.



