MCP's original SSE transport worked fine on your laptop. Then you tried deploying it to Vercel, and everything fell apart.
Long-lived connections timed out. Serverless functions couldn't hold state. Load balancers dropped SSE streams mid-conversation. The protocol designed to standardize AI agent tool use had a transport layer that fought against modern infrastructure.
The MCP specification team recognized this and shipped a replacement: Streamable HTTP transport. It is the most significant production-facing change to the MCP spec since the protocol launched, and it solves problems that blocked real-world deployments for months.
This guide covers exactly what Streamable HTTP is, why it replaced SSE, and how to implement it in TypeScript with working code.
What Problems Did SSE Transport Have?
SSE transport required persistent, long-lived HTTP connections between the MCP client and server, and that single requirement created a cascade of production failures.
The original MCP transport for remote connections used a two-endpoint design. The client sent messages via POST to one endpoint, and the server streamed responses back through a separate SSE endpoint that stayed open for the entire session. This design assumed the server was a long-running process with stable network connectivity. That assumption breaks in three common deployment scenarios.
Serverless platforms like Vercel, AWS Lambda, and Cloudflare Workers enforce execution timeouts (typically 10-30 seconds for edge functions, up to 5 minutes for standard functions). An SSE connection that needs to stay open for a full agent conversation simply cannot survive on these platforms.
Auto-scaling infrastructure like Kubernetes or Fly.io regularly terminates and replaces instances. When a server instance is removed during scale-down, every SSE connection to that instance drops with no recovery mechanism. The client has to reconnect from scratch, losing any in-flight tool responses.
Standard HTTP infrastructure like load balancers, CDNs, and reverse proxies were not built for indefinite SSE connections. Many load balancers enforce idle timeouts, proxy servers buffer SSE events (breaking real-time delivery), and connection draining during deploys kills active streams.
The MCP spec team summarized the problems concisely in the transport deprecation notice: no resumable streams, mandatory long-lived connections, and unidirectional message flow. Streamable HTTP fixes all three.
How Does Streamable HTTP Transport Work?
Streamable HTTP consolidates all communication into a single /mcp endpoint that supports both stateless request-response and optional SSE streaming, letting the server choose the right mode per request.
Instead of two separate endpoints (one for client-to-server POST, one for server-to-client SSE), Streamable HTTP uses one endpoint that handles both directions:
- Client-to-server messages: always a POST to
/mcpwith a JSON-RPC body - Server responses: either a direct JSON response (for simple results) or an SSE stream (for streaming results or long-running operations)
- Server-initiated messages: delivered through an optional SSE stream the client opens via GET on
/mcp
This design means every interaction starts as a normal HTTP request. The server can complete it synchronously (return JSON, close connection) or upgrade it to a stream (switch to SSE within the same response). The client does not need to maintain a persistent connection unless it explicitly wants server-initiated notifications.
Here is the message flow for a typical tool invocation:
Client Server
| |
| POST /mcp |
| Content-Type: application/json
| { "method": "tools/call", |
| "params": { "name": "lookup_customer", "arguments": {...} } }
| ----------------------------->
| |
| HTTP 200 |
| Content-Type: text/event-stream (server chose to stream)
| data: { "jsonrpc": "2.0", "id": 1, "result": {...} }
| <-----------------------------
| |For a quick tool like a database lookup, the server might return Content-Type: application/json with the result directly. For a tool that takes 30 seconds (generating a report, running a complex query), it returns Content-Type: text/event-stream and sends progress events followed by the final result.
What Changed Between SSE and Streamable HTTP?
The shift from SSE to Streamable HTTP is not just a technical tweak. It is a fundamental change in how MCP sessions are managed, and the differences affect server architecture, client implementation, and deployment topology.
| Aspect | SSE Transport (deprecated) | Streamable HTTP Transport |
|---|---|---|
| Endpoints | Two: POST for messages, GET for SSE stream | One: /mcp handles both POST and GET |
| Connection model | Persistent SSE connection required for entire session | Stateless by default, optional SSE upgrade |
| Serverless compatible | No. Requires long-lived connections | Yes. Each request can complete independently |
| Session management | Implicit via SSE connection lifetime | Explicit via Mcp-Session-Id header |
| Resumability | None. Dropped connection = lost messages | Built-in via Last-Event-Id replay |
| Server-to-client push | Always on (SSE stream always open) | Opt-in (client opens GET stream only if needed) |
| Load balancer friendly | Poor. Many proxies buffer or timeout SSE | Excellent. Standard HTTP request-response |
| Bidirectional communication | Partial. Client POST + server SSE | Full. Both directions through single endpoint |
| Content negotiation | Server always streams | Server chooses JSON or SSE per response |
The most impactful change for production systems is session management. With SSE, the session was the connection. Drop the connection, lose the session. With Streamable HTTP, the session is identified by a header (Mcp-Session-Id), and the connection is just a transport mechanism that can be established, dropped, and re-established without losing session state.
How Do You Implement a Streamable HTTP Server in TypeScript?
A Streamable HTTP MCP server is an Express (or any HTTP framework) application that handles POST and GET requests on a single /mcp endpoint, routing JSON-RPC messages to the MCP SDK's transport layer.
Here is a complete, working implementation:
import express from "express";
import { randomUUID } from "crypto";
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import { z } from "zod";
const app = express();
app.use(express.json());
// Store active transports by session ID
const sessions = new Map<string, StreamableHTTPServerTransport>();
function createMcpServer(): McpServer {
const server = new McpServer({
name: "customer-tools",
version: "1.0.0",
});
// Register a tool that returns data directly
server.tool(
"lookup_customer",
"Find a customer by email address",
{ email: z.string().email() },
async ({ email }) => {
const customer = await db.customers.findByEmail(email);
return {
content: [
{
type: "text",
text: JSON.stringify(customer, null, 2),
},
],
};
}
);
// Register a tool that benefits from streaming
server.tool(
"generate_report",
"Generate a usage report for a date range",
{
startDate: z.string(),
endDate: z.string(),
format: z.enum(["summary", "detailed"]),
},
async ({ startDate, endDate, format }) => {
const report = await analytics.generateReport({
startDate,
endDate,
format,
});
return {
content: [
{
type: "text",
text: JSON.stringify(report, null, 2),
},
],
};
}
);
return server;
}
// Handle client messages (POST) and session initialization
app.post("/mcp", async (req, res) => {
const sessionId = req.headers["mcp-session-id"] as string | undefined;
// Existing session: route message to its transport
if (sessionId && sessions.has(sessionId)) {
const transport = sessions.get(sessionId)!;
await transport.handleRequest(req, res);
return;
}
// New session: create transport + server
if (!sessionId) {
const newSessionId = randomUUID();
const transport = new StreamableHTTPServerTransport({
sessionId: newSessionId,
onsessioninitialized: () => {
sessions.set(newSessionId, transport);
},
});
// Clean up on close
transport.onclose = () => {
sessions.delete(newSessionId);
};
const server = createMcpServer();
await server.connect(transport);
await transport.handleRequest(req, res);
return;
}
// Session ID provided but not found (expired or invalid)
res.status(400).json({
jsonrpc: "2.0",
error: {
code: -32000,
message: "Invalid or expired session",
},
id: null,
});
});
// Handle server-to-client notifications (GET)
app.get("/mcp", async (req, res) => {
const sessionId = req.headers["mcp-session-id"] as string;
const transport = sessions.get(sessionId);
if (!transport) {
res.status(400).json({
jsonrpc: "2.0",
error: { code: -32000, message: "No active session" },
id: null,
});
return;
}
// Opens an SSE stream for server-initiated messages
await transport.handleRequest(req, res);
});
// Handle session termination (DELETE)
app.delete("/mcp", async (req, res) => {
const sessionId = req.headers["mcp-session-id"] as string;
const transport = sessions.get(sessionId);
if (transport) {
await transport.close();
sessions.delete(sessionId);
}
res.status(204).end();
});
app.listen(3002, () => {
console.log("MCP Streamable HTTP server running on port 3002");
});The key architectural points in this implementation:
- Session-per-transport: Each session gets its own
StreamableHTTPServerTransportinstance, which manages the JSON-RPC message routing. - Stateless entry point: The POST handler checks for an existing session or creates a new one. No persistent connection is required.
- Optional GET stream: The GET handler lets clients subscribe to server-initiated notifications, but the server works without it.
- Clean teardown: The DELETE handler lets clients explicitly end sessions, and the
onclosecallback cleans up the session map.
How Do You Build a Streamable HTTP Client?
The client side is simpler because the MCP SDK handles most of the transport negotiation. You create a StreamableHTTPClientTransport, connect it to your MCP client, and the SDK manages the request lifecycle.
import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StreamableHTTPClientTransport } from "@modelcontextprotocol/sdk/client/streamableHttp.js";
async function connectToMcpServer(serverUrl: string) {
const transport = new StreamableHTTPClientTransport(
new URL(serverUrl)
);
const client = new Client({
name: "my-agent",
version: "1.0.0",
});
// Connect establishes the session (sends initialize request)
await client.connect(transport);
// Discover available tools
const { tools } = await client.listTools();
console.log(
"Available tools:",
tools.map((t) => t.name)
);
// Invoke a tool
const result = await client.callTool({
name: "lookup_customer",
arguments: { email: "customer@example.com" },
});
console.log("Result:", result.content);
// Clean up
await client.close();
}
connectToMcpServer("http://localhost:3002/mcp");The client transport automatically:
- Sends the
Mcp-Session-Idheader on subsequent requests after initialization - Handles content negotiation (JSON vs SSE responses)
- Reconnects with
Last-Event-Idif the notification stream drops
Why Does Streamable HTTP Matter for Production Agent Deployments?
Streamable HTTP turns MCP from a protocol that works in demos into one that works in real infrastructure, and the difference comes down to three production requirements: serverless deployment, graceful scaling, and network resilience.
Serverless deployment
The original SSE transport could not run on serverless platforms at all. The connection model required the server process to stay alive for the entire session, which contradicts the fundamental serverless model of ephemeral function execution.
Streamable HTTP solves this completely. A serverless function handles one POST, returns the result, and terminates. If the tool invocation takes longer than the function timeout, the server can return a task handle (using the experimental Tasks primitive from the November 2025 spec update) and the client polls for the result.
This is how Chanl's MCP server operates in production on Vercel: each tool invocation is a single HTTP request that completes within the platform's execution budget.
Graceful scaling
When a server instance is added or removed during auto-scaling, existing sessions need to survive. With SSE transport, losing the server instance meant losing every connected client session.
Streamable HTTP sessions are identified by a header, not a connection. If session state is stored externally (Redis, a database), any server instance can handle any session. Load balancers do not need sticky sessions. Blue-green deployments work without draining connections for minutes.
// External session store for horizontal scaling
import { createClient } from "redis";
const redis = createClient();
const sessions = {
async get(id: string) {
const data = await redis.get(`mcp-session:${id}`);
return data ? JSON.parse(data) : null;
},
async set(id: string, transport: StreamableHTTPServerTransport) {
await redis.set(
`mcp-session:${id}`,
JSON.stringify(transport.serialize()),
{ EX: 3600 } // 1 hour TTL
);
},
async delete(id: string) {
await redis.del(`mcp-session:${id}`);
},
};Network resilience
Mobile clients, IoT devices, and agents running over unreliable networks will lose connectivity. With SSE transport, every disconnection was a full session reset.
Streamable HTTP's Last-Event-Id mechanism means the client can reconnect and pick up where it left off. The server replays events the client missed, and the session continues without the agent losing context or tool results.
This resilience is especially important for AI agents handling multi-step tasks. If an agent is midway through a workflow that involves three tool calls and the network blips after the second, the session resumes at the third step rather than restarting from the beginning.
How Do You Migrate an Existing SSE Server to Streamable HTTP?
Migration is straightforward because the MCP SDK handles most of the transport abstraction. The core changes are: replace the transport class, consolidate endpoints, and add session management.
Step 1: Replace the transport import
// Before (SSE)
import { SSEServerTransport } from "@modelcontextprotocol/sdk/server/sse.js";
// After (Streamable HTTP)
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";Step 2: Consolidate endpoints
SSE used two endpoints (one for POST messages, one for the SSE stream). Streamable HTTP uses one:
// Before: two separate endpoints
app.post("/messages", handleMessage);
app.get("/sse", handleSseStream);
// After: single endpoint, three methods
app.post("/mcp", handlePost);
app.get("/mcp", handleGet); // optional, for server push
app.delete("/mcp", handleDelete); // optional, for clean shutdownStep 3: Add session management
The StreamableHTTPServerTransport constructor accepts a sessionId parameter. Generate one per client initialization:
const transport = new StreamableHTTPServerTransport({
sessionId: randomUUID(),
onsessioninitialized: () => {
// Store the transport for subsequent requests
},
});Step 4: Update your client connections
If you control the client code, swap the transport class:
// Before
import { SSEClientTransport } from "@modelcontextprotocol/sdk/client/sse.js";
const transport = new SSEClientTransport(new URL("http://server/sse"));
// After
import { StreamableHTTPClientTransport } from "@modelcontextprotocol/sdk/client/streamableHttp.js";
const transport = new StreamableHTTPClientTransport(new URL("http://server/mcp"));Your MCP server definitions (tools, resources, prompts) do not change at all. The transport is a separate layer.
How Does Streamable HTTP Handle Authentication?
The MCP specification pairs Streamable HTTP with OAuth 2.1 and PKCE for authentication. The transport itself does not define auth, but the June 2025 spec update formalized how authentication integrates with the HTTP layer.
The authentication flow works like this:
- Client sends an
initializePOST to/mcp - Server returns
401 Unauthorizedwith aWWW-Authenticateheader pointing to the OAuth authorization server - Client performs the OAuth 2.1 flow (PKCE with S256 challenge method)
- Client retries with the access token in the
Authorization: Bearerheader - Server validates the token and proceeds with session initialization
For tool-based agent systems, this means each MCP server connection can be authenticated independently. An agent connecting to three different MCP servers (CRM, database, calendar) authenticates with each server's own identity provider. There is no shared credential or single point of compromise.
// Server-side auth middleware for MCP endpoint
app.use("/mcp", async (req, res, next) => {
// Skip auth for initialization probes
if (req.method === "POST") {
const body = req.body;
if (body?.method === "initialize") {
// Let the transport handle the initialize handshake
return next();
}
}
const token = req.headers.authorization?.replace("Bearer ", "");
if (!token) {
res.status(401).json({
jsonrpc: "2.0",
error: { code: -32001, message: "Authentication required" },
id: null,
});
return;
}
try {
const claims = await verifyOAuthToken(token);
req.tenantId = claims.tenantId;
next();
} catch {
res.status(401).json({
jsonrpc: "2.0",
error: { code: -32001, message: "Invalid token" },
id: null,
});
}
});The November 2025 spec update added Resource Indicators (RFC 8707), which bind tokens to specific MCP servers. This prevents a token issued for Server A from being replayed against Server B, a real attack vector in multi-server agent deployments.
When Should You Use Streamable HTTP vs. Stdio?
Stdio is for local development and single-user desktop integrations. Streamable HTTP is for everything else. The decision is almost always determined by whether the MCP server runs as a separate process that multiple clients can reach.
| Use case | Transport | Why |
|---|---|---|
| Local development and testing | stdio | Client spawns server as child process, zero network config |
| Claude Desktop / VS Code integration | stdio | Desktop app manages the server process lifecycle |
| Production agent backend | Streamable HTTP | Server runs independently, serves multiple agents |
| Serverless deployment (Vercel, Lambda) | Streamable HTTP | Only option that works with execution timeouts |
| Multi-tenant SaaS platform | Streamable HTTP | Session isolation, OAuth per tenant, horizontal scaling |
| IoT or mobile clients | Streamable HTTP | Network resilience with session resumption |
| CI/CD testing pipelines | stdio | Simple, no network dependencies, deterministic |
For teams building AI agents that use tools in production, Streamable HTTP is the default choice. The only scenario where stdio makes sense in production is when the client and server are always co-located on the same machine and there is exactly one client per server instance.
What Performance Characteristics Should You Expect?
Streamable HTTP adds minimal overhead compared to direct HTTP calls because it is, at its core, standard HTTP with a JSON-RPC payload format. The protocol-level overhead is the JSON-RPC envelope (roughly 50-100 bytes per message) and the session header.
Benchmarks from real deployments show these characteristics:
Latency: A Streamable HTTP tool invocation adds 2-5ms of protocol overhead on top of the actual tool execution time. For a tool that takes 200ms to run, total round-trip is 202-205ms. This is comparable to any REST API call.
Throughput: Because each request is independent, throughput scales linearly with server instances. There is no per-connection state on the server (unless you opt into sessions), so a single server process can handle thousands of concurrent tool invocations.
Connection efficiency: Unlike SSE, which held one connection per client for the entire session, Streamable HTTP connections are short-lived. A server handling 1,000 concurrent agent sessions with SSE needed 1,000 open connections. With Streamable HTTP, it needs connections only for active requests, typically 50-100 at any given moment.
Memory usage: Session state (when used) is a lightweight data structure containing the session ID, capabilities negotiated during initialization, and a message replay buffer. Typical memory per session is 2-10 KB, compared to the 50-200 KB per open SSE connection.
What Are Common Mistakes When Implementing Streamable HTTP?
Most implementation issues come from applying SSE-era assumptions to the new transport model. Here are the patterns that cause the most production incidents.
Assuming all responses are streamed. The server can return application/json for synchronous results or text/event-stream for streaming results. Your client must handle both content types. The MCP SDK client does this automatically, but custom clients often check only for SSE.
Skipping session cleanup. Without the DELETE handler, abandoned sessions accumulate in memory. Set a TTL on your session store and implement the DELETE endpoint so clients can clean up explicitly.
Ignoring the Mcp-Session-Id on subsequent requests. After initialization, every request must include the session header. If a client sends a tools/call without the session ID, the server cannot route it to the correct transport instance. The SDK handles this automatically, but middleware or proxy layers sometimes strip custom headers.
Buffering SSE in reverse proxies. If your server upgrades a response to SSE, make sure your reverse proxy (Nginx, Cloudflare, etc.) passes the text/event-stream content type without buffering. This is the same issue as with the old SSE transport, but it only affects the subset of responses that actually use streaming.
Not implementing Last-Event-Id replay. Session resumption only works if your server stores recent events and can replay them on reconnect. If you skip this, clients that reconnect after a network blip will miss events. For short-lived tool calls this might be acceptable, but for long-running operations it causes silent data loss.
Where Does Streamable HTTP Fit in the MCP Roadmap?
Streamable HTTP is the foundation for several upcoming MCP features, including Tasks (asynchronous long-running operations), enhanced sampling, and multi-server agent orchestration.
The experimental Tasks primitive, introduced in the November 2025 spec update, depends directly on Streamable HTTP. Tasks let a tool invocation return a handle instead of a result, with the client polling for completion. This pattern only works cleanly with HTTP request-response semantics. With SSE, there was no natural way to return a "pending" result and follow up later.
Multi-agent architectures also benefit from Streamable HTTP. When multiple agents coordinate through a shared set of MCP servers, each agent maintains its own session independently. There is no shared connection state, so agents can be added, removed, or restarted without affecting other agents in the system.
The MCP specification moved to the Linux Foundation in December 2025, and the transport layer is now governed by a working group with representatives from Anthropic, OpenAI, Google, and Microsoft. Streamable HTTP was the first specification produced by this working group, which signals that it represents consensus across the major AI platform providers.
For teams evaluating MCP for production use, the transport question is settled. Streamable HTTP is the standard for remote MCP connections. Building on it today means your infrastructure is aligned with where the entire ecosystem is heading.
If you are building an MCP server for the first time, the MCP from scratch tutorial walks through the full setup with both TypeScript and Python. For teams that already have a working MCP server and need to scale it, the MCP deep dive covers OAuth 2.1, gateways, sampling, and multi-tenant patterns.
Co-founder
Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.
Aprende IA Agéntica
Una lección por semana: técnicas prácticas para construir, probar y lanzar agentes IA. Desde ingeniería de prompts hasta monitoreo en producción. Aprende haciendo.



