What is MCP Streamable HTTP transport?

Streamable HTTP is the current standard transport for remote MCP connections. It replaces the original SSE (Server-Sent Events) transport with a flexible HTTP-based approach where every client message is a POST to a single /mcp endpoint, and the server can respond with either a direct JSON response or upgrade to an SSE stream for real-time results.

Why did MCP deprecate SSE transport in favor of Streamable HTTP?

SSE transport had three critical production limitations: it required long-lived server connections that conflicted with serverless and auto-scaling infrastructure, it offered no built-in session resumability after disconnection, and its unidirectional message flow prevented servers from initiating communication back to clients. Streamable HTTP addresses all three issues.

Can Streamable HTTP work on serverless platforms like Vercel or AWS Lambda?

Yes. Unlike SSE, which required persistent connections, Streamable HTTP supports a fully stateless request-response pattern. A server can respond to each POST with a single JSON result and terminate the connection. This makes it compatible with serverless functions, edge workers, and any infrastructure with connection timeouts.

What is the Mcp-Session-Id header and how does it work?

When a server wants to maintain state across requests, it returns an Mcp-Session-Id header in the response to the client's initialize request. The client then includes this header on all subsequent requests. This enables stateful features like server-initiated notifications while keeping the HTTP layer itself stateless.

How does session resumability work in Streamable HTTP?

Streamable HTTP supports session resumption through the Last-Event-Id mechanism from the SSE specification. When a client reconnects after a dropped connection, it sends the ID of the last event it received. The server can then replay missed events, ensuring no messages are lost during network interruptions.

Do I need to migrate from SSE transport immediately?

SSE transport is officially deprecated in the MCP specification, though many client libraries still support it for backward compatibility. New implementations should use Streamable HTTP exclusively. If you have an existing SSE-based server in production, plan migration within the next few months as client libraries begin dropping SSE support.

What is the difference between stdio and Streamable HTTP transport?

Stdio transport is for local development where the client spawns the MCP server as a child process and communicates over standard input/output. Streamable HTTP is for remote deployment where the server runs as an HTTP service. Most production agent deployments use Streamable HTTP because the server runs independently and can serve multiple clients.

How does Streamable HTTP handle server-to-client notifications?

When the server needs to push notifications (like tool list changes), it opens an SSE stream in response to a client GET request on the /mcp endpoint. The client keeps this stream open to receive server-initiated messages. This is optional, so lightweight deployments that do not need server push can skip it entirely.

MCP Streamable HTTP: The Transport Layer That Makes AI Agents Production-Ready

MCP's original SSE transport worked fine on your laptop. Then you tried deploying it to Vercel, and everything fell apart.

Long-lived connections timed out. Serverless functions couldn't hold state. Load balancers dropped SSE streams mid-conversation. The protocol designed to standardize AI agent tool use had a transport layer that fought against modern infrastructure.

The MCP specification team recognized this and shipped a replacement: Streamable HTTP transport. It is the most significant production-facing change to the MCP spec since the protocol launched, and it solves problems that blocked real-world deployments for months.

This guide covers exactly what Streamable HTTP is, why it replaced SSE, and how to implement it in TypeScript with working code.

What Problems Did SSE Transport Have?

SSE transport required persistent, long-lived HTTP connections between the MCP client and server, and that single requirement created a cascade of production failures.

The original MCP transport for remote connections used a two-endpoint design. The client sent messages via POST to one endpoint, and the server streamed responses back through a separate SSE endpoint that stayed open for the entire session. This design assumed the server was a long-running process with stable network connectivity. That assumption breaks in three common deployment scenarios.

Serverless platforms like Vercel, AWS Lambda, and Cloudflare Workers enforce execution timeouts (typically 10-30 seconds for edge functions, up to 5 minutes for standard functions). An SSE connection that needs to stay open for a full agent conversation simply cannot survive on these platforms.

Auto-scaling infrastructure like Kubernetes or Fly.io regularly terminates and replaces instances. When a server instance is removed during scale-down, every SSE connection to that instance drops with no recovery mechanism. The client has to reconnect from scratch, losing any in-flight tool responses.

Standard HTTP infrastructure like load balancers, CDNs, and reverse proxies were not built for indefinite SSE connections. Many load balancers enforce idle timeouts, proxy servers buffer SSE events (breaking real-time delivery), and connection draining during deploys kills active streams.

The MCP spec team summarized the problems concisely in the transport deprecation notice: no resumable streams, mandatory long-lived connections, and unidirectional message flow. Streamable HTTP fixes all three.

How Does Streamable HTTP Transport Work?

Streamable HTTP consolidates all communication into a single /mcp endpoint that supports both stateless request-response and optional SSE streaming, letting the server choose the right mode per request.

Instead of two separate endpoints (one for client-to-server POST, one for server-to-client SSE), Streamable HTTP uses one endpoint that handles both directions:

Client-to-server messages: always a POST to /mcp with a JSON-RPC body
Server responses: either a direct JSON response (for simple results) or an SSE stream (for streaming results or long-running operations)
Server-initiated messages: delivered through an optional SSE stream the client opens via GET on /mcp

This design means every interaction starts as a normal HTTP request. The server can complete it synchronously (return JSON, close connection) or upgrade it to a stream (switch to SSE within the same response). The client does not need to maintain a persistent connection unless it explicitly wants server-initiated notifications.

Here is the message flow for a typical tool invocation:

text

Client                          Server
  |                               |
  |  POST /mcp                    |
  |  Content-Type: application/json
  |  { "method": "tools/call",   |
  |    "params": { "name": "lookup_customer", "arguments": {...} } }
  |  ----------------------------->
  |                               |
  |  HTTP 200                     |
  |  Content-Type: text/event-stream  (server chose to stream)
  |  data: { "jsonrpc": "2.0", "id": 1, "result": {...} }
  |  <-----------------------------
  |                               |

For a quick tool like a database lookup, the server might return Content-Type: application/json with the result directly. For a tool that takes 30 seconds (generating a report, running a complex query), it returns Content-Type: text/event-stream and sends progress events followed by the final result.

What Changed Between SSE and Streamable HTTP?

The shift from SSE to Streamable HTTP is not just a technical tweak. It is a fundamental change in how MCP sessions are managed, and the differences affect server architecture, client implementation, and deployment topology.

Aspect	SSE Transport (deprecated)	Streamable HTTP Transport
Endpoints	Two: POST for messages, GET for SSE stream	One: `/mcp` handles both POST and GET
Connection model	Persistent SSE connection required for entire session	Stateless by default, optional SSE upgrade
Serverless compatible	No. Requires long-lived connections	Yes. Each request can complete independently
Session management	Implicit via SSE connection lifetime	Explicit via `Mcp-Session-Id` header
Resumability	None. Dropped connection = lost messages	Built-in via `Last-Event-Id` replay
Server-to-client push	Always on (SSE stream always open)	Opt-in (client opens GET stream only if needed)
Load balancer friendly	Poor. Many proxies buffer or timeout SSE	Excellent. Standard HTTP request-response
Bidirectional communication	Partial. Client POST + server SSE	Full. Both directions through single endpoint
Content negotiation	Server always streams	Server chooses JSON or SSE per response

The most impactful change for production systems is session management. With SSE, the session was the connection. Drop the connection, lose the session. With Streamable HTTP, the session is identified by a header (Mcp-Session-Id), and the connection is just a transport mechanism that can be established, dropped, and re-established without losing session state.

How Do You Implement a Streamable HTTP Server in TypeScript?

A Streamable HTTP MCP server is an Express (or any HTTP framework) application that handles POST and GET requests on a single /mcp endpoint, routing JSON-RPC messages to the MCP SDK's transport layer.

Here is a complete, working implementation:

typescript

import express from "express";
import { randomUUID } from "crypto";
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import { z } from "zod";
 
const app = express();
app.use(express.json());
 
// Store active transports by session ID
const sessions = new Map<string, StreamableHTTPServerTransport>();
 
function createMcpServer(): McpServer {
  const server = new McpServer({
    name: "customer-tools",
    version: "1.0.0",
  });
 
  // Register a tool that returns data directly
  server.tool(
    "lookup_customer",
    "Find a customer by email address",
    { email: z.string().email() },
    async ({ email }) => {
      const customer = await db.customers.findByEmail(email);
      return {
        content: [
          {
            type: "text",
            text: JSON.stringify(customer, null, 2),
          },
        ],
      };
    }
  );
 
  // Register a tool that benefits from streaming
  server.tool(
    "generate_report",
    "Generate a usage report for a date range",
    {
      startDate: z.string(),
      endDate: z.string(),
      format: z.enum(["summary", "detailed"]),
    },
    async ({ startDate, endDate, format }) => {
      const report = await analytics.generateReport({
        startDate,
        endDate,
        format,
      });
      return {
        content: [
          {
            type: "text",
            text: JSON.stringify(report, null, 2),
          },
        ],
      };
    }
  );
 
  return server;
}
 
// Handle client messages (POST) and session initialization
app.post("/mcp", async (req, res) => {
  const sessionId = req.headers["mcp-session-id"] as string | undefined;
 
  // Existing session: route message to its transport
  if (sessionId && sessions.has(sessionId)) {
    const transport = sessions.get(sessionId)!;
    await transport.handleRequest(req, res);
    return;
  }
 
  // New session: create transport + server
  if (!sessionId) {
    const newSessionId = randomUUID();
    const transport = new StreamableHTTPServerTransport({
      sessionId: newSessionId,
      onsessioninitialized: () => {
        sessions.set(newSessionId, transport);
      },
    });
 
    // Clean up on close
    transport.onclose = () => {
      sessions.delete(newSessionId);
    };
 
    const server = createMcpServer();
    await server.connect(transport);
    await transport.handleRequest(req, res);
    return;
  }
 
  // Session ID provided but not found (expired or invalid)
  res.status(400).json({
    jsonrpc: "2.0",
    error: {
      code: -32000,
      message: "Invalid or expired session",
    },
    id: null,
  });
});
 
// Handle server-to-client notifications (GET)
app.get("/mcp", async (req, res) => {
  const sessionId = req.headers["mcp-session-id"] as string;
  const transport = sessions.get(sessionId);
 
  if (!transport) {
    res.status(400).json({
      jsonrpc: "2.0",
      error: { code: -32000, message: "No active session" },
      id: null,
    });
    return;
  }
 
  // Opens an SSE stream for server-initiated messages
  await transport.handleRequest(req, res);
});
 
// Handle session termination (DELETE)
app.delete("/mcp", async (req, res) => {
  const sessionId = req.headers["mcp-session-id"] as string;
  const transport = sessions.get(sessionId);
 
  if (transport) {
    await transport.close();
    sessions.delete(sessionId);
  }
 
  res.status(204).end();
});
 
app.listen(3002, () => {
  console.log("MCP Streamable HTTP server running on port 3002");
});

The key architectural points in this implementation:

Session-per-transport: Each session gets its own StreamableHTTPServerTransport instance, which manages the JSON-RPC message routing.
Stateless entry point: The POST handler checks for an existing session or creates a new one. No persistent connection is required.
Optional GET stream: The GET handler lets clients subscribe to server-initiated notifications, but the server works without it.
Clean teardown: The DELETE handler lets clients explicitly end sessions, and the onclose callback cleans up the session map.

How Do You Build a Streamable HTTP Client?

The client side is simpler because the MCP SDK handles most of the transport negotiation. You create a StreamableHTTPClientTransport, connect it to your MCP client, and the SDK manages the request lifecycle.

typescript

import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StreamableHTTPClientTransport } from "@modelcontextprotocol/sdk/client/streamableHttp.js";
 
async function connectToMcpServer(serverUrl: string) {
  const transport = new StreamableHTTPClientTransport(
    new URL(serverUrl)
  );
 
  const client = new Client({
    name: "my-agent",
    version: "1.0.0",
  });
 
  // Connect establishes the session (sends initialize request)
  await client.connect(transport);
 
  // Discover available tools
  const { tools } = await client.listTools();
  console.log(
    "Available tools:",
    tools.map((t) => t.name)
  );
 
  // Invoke a tool
  const result = await client.callTool({
    name: "lookup_customer",
    arguments: { email: "customer@example.com" },
  });
 
  console.log("Result:", result.content);
 
  // Clean up
  await client.close();
}
 
connectToMcpServer("http://localhost:3002/mcp");

The client transport automatically:

Sends the Mcp-Session-Id header on subsequent requests after initialization
Handles content negotiation (JSON vs SSE responses)
Reconnects with Last-Event-Id if the notification stream drops

Why Does Streamable HTTP Matter for Production Agent Deployments?

Streamable HTTP turns MCP from a protocol that works in demos into one that works in real infrastructure, and the difference comes down to three production requirements: serverless deployment, graceful scaling, and network resilience.

Serverless deployment

The original SSE transport could not run on serverless platforms at all. The connection model required the server process to stay alive for the entire session, which contradicts the fundamental serverless model of ephemeral function execution.

Streamable HTTP solves this completely. A serverless function handles one POST, returns the result, and terminates. If the tool invocation takes longer than the function timeout, the server can return a task handle (using the experimental Tasks primitive from the November 2025 spec update) and the client polls for the result.

This is how Chanl's MCP server operates in production on Vercel: each tool invocation is a single HTTP request that completes within the platform's execution budget.

Graceful scaling

When a server instance is added or removed during auto-scaling, existing sessions need to survive. With SSE transport, losing the server instance meant losing every connected client session.

Streamable HTTP sessions are identified by a header, not a connection. If session state is stored externally (Redis, a database), any server instance can handle any session. Load balancers do not need sticky sessions. Blue-green deployments work without draining connections for minutes.

typescript

// External session store for horizontal scaling
import { createClient } from "redis";
 
const redis = createClient();
 
const sessions = {
  async get(id: string) {
    const data = await redis.get(`mcp-session:${id}`);
    return data ? JSON.parse(data) : null;
  },
  async set(id: string, transport: StreamableHTTPServerTransport) {
    await redis.set(
      `mcp-session:${id}`,
      JSON.stringify(transport.serialize()),
      { EX: 3600 } // 1 hour TTL
    );
  },
  async delete(id: string) {
    await redis.del(`mcp-session:${id}`);
  },
};

Network resilience

Mobile clients, IoT devices, and agents running over unreliable networks will lose connectivity. With SSE transport, every disconnection was a full session reset.

Streamable HTTP's Last-Event-Id mechanism means the client can reconnect and pick up where it left off. The server replays events the client missed, and the session continues without the agent losing context or tool results.

This resilience is especially important for AI agents handling multi-step tasks. If an agent is midway through a workflow that involves three tool calls and the network blips after the second, the session resumes at the third step rather than restarting from the beginning.

How Do You Migrate an Existing SSE Server to Streamable HTTP?

Migration is straightforward because the MCP SDK handles most of the transport abstraction. The core changes are: replace the transport class, consolidate endpoints, and add session management.

Step 1: Replace the transport import

typescript

// Before (SSE)
import { SSEServerTransport } from "@modelcontextprotocol/sdk/server/sse.js";
 
// After (Streamable HTTP)
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";

Step 2: Consolidate endpoints

SSE used two endpoints (one for POST messages, one for the SSE stream). Streamable HTTP uses one:

typescript

// Before: two separate endpoints
app.post("/messages", handleMessage);
app.get("/sse", handleSseStream);
 
// After: single endpoint, three methods
app.post("/mcp", handlePost);
app.get("/mcp", handleGet);    // optional, for server push
app.delete("/mcp", handleDelete); // optional, for clean shutdown

Step 3: Add session management

The StreamableHTTPServerTransport constructor accepts a sessionId parameter. Generate one per client initialization:

typescript

const transport = new StreamableHTTPServerTransport({
  sessionId: randomUUID(),
  onsessioninitialized: () => {
    // Store the transport for subsequent requests
  },
});

Step 4: Update your client connections

If you control the client code, swap the transport class:

typescript

// Before
import { SSEClientTransport } from "@modelcontextprotocol/sdk/client/sse.js";
const transport = new SSEClientTransport(new URL("http://server/sse"));
 
// After
import { StreamableHTTPClientTransport } from "@modelcontextprotocol/sdk/client/streamableHttp.js";
const transport = new StreamableHTTPClientTransport(new URL("http://server/mcp"));

Your MCP server definitions (tools, resources, prompts) do not change at all. The transport is a separate layer.

How Does Streamable HTTP Handle Authentication?

The MCP specification pairs Streamable HTTP with OAuth 2.1 and PKCE for authentication. The transport itself does not define auth, but the June 2025 spec update formalized how authentication integrates with the HTTP layer.

The authentication flow works like this:

Client sends an initialize POST to /mcp
Server returns 401 Unauthorized with a WWW-Authenticate header pointing to the OAuth authorization server
Client performs the OAuth 2.1 flow (PKCE with S256 challenge method)
Client retries with the access token in the Authorization: Bearer header
Server validates the token and proceeds with session initialization

For tool-based agent systems, this means each MCP server connection can be authenticated independently. An agent connecting to three different MCP servers (CRM, database, calendar) authenticates with each server's own identity provider. There is no shared credential or single point of compromise.

typescript

// Server-side auth middleware for MCP endpoint
app.use("/mcp", async (req, res, next) => {
  // Skip auth for initialization probes
  if (req.method === "POST") {
    const body = req.body;
    if (body?.method === "initialize") {
      // Let the transport handle the initialize handshake
      return next();
    }
  }
 
  const token = req.headers.authorization?.replace("Bearer ", "");
  if (!token) {
    res.status(401).json({
      jsonrpc: "2.0",
      error: { code: -32001, message: "Authentication required" },
      id: null,
    });
    return;
  }
 
  try {
    const claims = await verifyOAuthToken(token);
    req.tenantId = claims.tenantId;
    next();
  } catch {
    res.status(401).json({
      jsonrpc: "2.0",
      error: { code: -32001, message: "Invalid token" },
      id: null,
    });
  }
});

The November 2025 spec update added Resource Indicators (RFC 8707), which bind tokens to specific MCP servers. This prevents a token issued for Server A from being replayed against Server B, a real attack vector in multi-server agent deployments.

When Should You Use Streamable HTTP vs. Stdio?

Stdio is for local development and single-user desktop integrations. Streamable HTTP is for everything else. The decision is almost always determined by whether the MCP server runs as a separate process that multiple clients can reach.

Use case	Transport	Why
Local development and testing	stdio	Client spawns server as child process, zero network config
Claude Desktop / VS Code integration	stdio	Desktop app manages the server process lifecycle
Production agent backend	Streamable HTTP	Server runs independently, serves multiple agents
Serverless deployment (Vercel, Lambda)	Streamable HTTP	Only option that works with execution timeouts
Multi-tenant SaaS platform	Streamable HTTP	Session isolation, OAuth per tenant, horizontal scaling
IoT or mobile clients	Streamable HTTP	Network resilience with session resumption
CI/CD testing pipelines	stdio	Simple, no network dependencies, deterministic

For teams building AI agents that use tools in production, Streamable HTTP is the default choice. The only scenario where stdio makes sense in production is when the client and server are always co-located on the same machine and there is exactly one client per server instance.

What Performance Characteristics Should You Expect?

Streamable HTTP adds minimal overhead compared to direct HTTP calls because it is, at its core, standard HTTP with a JSON-RPC payload format. The protocol-level overhead is the JSON-RPC envelope (roughly 50-100 bytes per message) and the session header.

Benchmarks from real deployments show these characteristics:

Latency: A Streamable HTTP tool invocation adds 2-5ms of protocol overhead on top of the actual tool execution time. For a tool that takes 200ms to run, total round-trip is 202-205ms. This is comparable to any REST API call.

Throughput: Because each request is independent, throughput scales linearly with server instances. There is no per-connection state on the server (unless you opt into sessions), so a single server process can handle thousands of concurrent tool invocations.

Connection efficiency: Unlike SSE, which held one connection per client for the entire session, Streamable HTTP connections are short-lived. A server handling 1,000 concurrent agent sessions with SSE needed 1,000 open connections. With Streamable HTTP, it needs connections only for active requests, typically 50-100 at any given moment.

Memory usage: Session state (when used) is a lightweight data structure containing the session ID, capabilities negotiated during initialization, and a message replay buffer. Typical memory per session is 2-10 KB, compared to the 50-200 KB per open SSE connection.

What Are Common Mistakes When Implementing Streamable HTTP?

Most implementation issues come from applying SSE-era assumptions to the new transport model. Here are the patterns that cause the most production incidents.

Assuming all responses are streamed. The server can return application/json for synchronous results or text/event-stream for streaming results. Your client must handle both content types. The MCP SDK client does this automatically, but custom clients often check only for SSE.

Skipping session cleanup. Without the DELETE handler, abandoned sessions accumulate in memory. Set a TTL on your session store and implement the DELETE endpoint so clients can clean up explicitly.

Ignoring the Mcp-Session-Id on subsequent requests. After initialization, every request must include the session header. If a client sends a tools/call without the session ID, the server cannot route it to the correct transport instance. The SDK handles this automatically, but middleware or proxy layers sometimes strip custom headers.

Buffering SSE in reverse proxies. If your server upgrades a response to SSE, make sure your reverse proxy (Nginx, Cloudflare, etc.) passes the text/event-stream content type without buffering. This is the same issue as with the old SSE transport, but it only affects the subset of responses that actually use streaming.

Not implementing Last-Event-Id replay. Session resumption only works if your server stores recent events and can replay them on reconnect. If you skip this, clients that reconnect after a network blip will miss events. For short-lived tool calls this might be acceptable, but for long-running operations it causes silent data loss.

Where Does Streamable HTTP Fit in the MCP Roadmap?

Streamable HTTP is the foundation for several upcoming MCP features, including Tasks (asynchronous long-running operations), enhanced sampling, and multi-server agent orchestration.

The experimental Tasks primitive, introduced in the November 2025 spec update, depends directly on Streamable HTTP. Tasks let a tool invocation return a handle instead of a result, with the client polling for completion. This pattern only works cleanly with HTTP request-response semantics. With SSE, there was no natural way to return a "pending" result and follow up later.

Multi-agent architectures also benefit from Streamable HTTP. When multiple agents coordinate through a shared set of MCP servers, each agent maintains its own session independently. There is no shared connection state, so agents can be added, removed, or restarted without affecting other agents in the system.

The MCP specification moved to the Linux Foundation in December 2025, and the transport layer is now governed by a working group with representatives from Anthropic, OpenAI, Google, and Microsoft. Streamable HTTP was the first specification produced by this working group, which signals that it represents consensus across the major AI platform providers.

For teams evaluating MCP for production use, the transport question is settled. Streamable HTTP is the standard for remote MCP connections. Building on it today means your infrastructure is aligned with where the entire ecosystem is heading.

If you are building an MCP server for the first time, the MCP from scratch tutorial walks through the full setup with both TypeScript and Python. For teams that already have a working MCP server and need to scale it, the MCP deep dive covers OAuth 2.1, gateways, sampling, and multi-tenant patterns.

Get Started

Key Takeaway

Testing edge cases before production deployment can reduce customer complaints by 80% and prevent costly emergency fixes post-launch.

mcp model-context-protocol streamable-http ai-agents streaming transport-layer sse tool-calling

Dean Grover

Co-founder

Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.

The Signal Briefing

Un email por semana. Cómo los equipos líderes de CS, ingresos e IA están convirtiendo conversaciones en decisiones. Benchmarks, playbooks y lo que funciona en producción.