What is WebMCP and how does it relate to MCP?

WebMCP is a W3C Draft Community Group Report that brings MCP-inspired tool registration into the browser via the navigator.modelContext API. Unlike server-side MCP (which uses JSON-RPC between backend services), WebMCP runs entirely client-side. The browser translates registered tools into MCP format when communicating with agents, but WebMCP itself uses a simpler JavaScript API rather than the full MCP protocol.

Which browsers support WebMCP today?

As of March 2026, only Chrome 146 Canary supports WebMCP behind the 'WebMCP for testing' flag at chrome://flags. Microsoft Edge support is expected but not formally announced. Firefox and Safari have not indicated plans. Broader rollout is expected by mid-to-late 2026.

Does WebMCP replace server-side MCP?

No. They are complementary. Server-side MCP handles backend integrations, service-to-service automation, and headless agent workflows. WebMCP handles browser-based interactions where a user is present and the agent operates within the user's authenticated session. Most production systems will use both.

How does WebMCP handle security and user consent?

WebMCP uses a permission-first model. The browser mediates all tool invocations and requires explicit user consent before executing sensitive actions via requestUserInteraction(). Tools inherit the page's origin security boundary, require HTTPS, and respect Content Security Policy. The user is always in the loop.

What performance improvements does WebMCP offer over screenshot-based agents?

Early benchmarks show 89% token reduction versus screenshot approaches, 67% computational overhead reduction, and 98% task accuracy on structured tool calls. These gains come from replacing vision model inference on 500KB+ images with structured JSON schemas that are a few hundred bytes.

Can I use WebMCP with existing HTML forms?

Yes. The Declarative API requires zero JavaScript. Add toolname, tooldescription, and optional toolparamdescription attributes to your existing form elements, and Chrome auto-converts them into callable tools for agents.

Why Browser Agents Waste 89% of Their Tokens

Our agent was screenshotting web pages. Every interaction started with a 500KB PNG, 300ms of vision model processing, and an 85% chance the agent would click the right button. On a good day.

The screenshots ate tokens. A single page capture consumed 1,500-2,000 tokens just to describe what a human could see in a glance. Multiply that by every step in a checkout flow, a flight search, a form submission. Our agent burned through context windows like they were free.

Then navigator.modelContext shipped in Chrome 146 Canary.

Instead of sending a screenshot and asking "what do you see?", the website now tells the agent: "Here are my tools, here are their parameters, here is how to call them." Structured JSON. A few hundred bytes. 89% fewer tokens. 98% task accuracy.

This is WebMCP, and it changes the economics of every browser-based AI agent.

What WebMCP Actually Is
Screenshot vs. WebMCP
The Declarative API
The Imperative API
Security and Consent
WebMCP vs. Server-Side MCP
What This Means for Agents
Getting Started Today

What WebMCP Actually Is

WebMCP is a browser-native JavaScript API (navigator.modelContext) that lets websites expose structured, callable tools to AI agents. Instead of agents reverse-engineering a page from its DOM or pixels, the page declares what it can do. Jointly developed by Google and Microsoft through the W3C Web Machine Learning Community Group, it shipped as an early preview on February 10, 2026.

WebMCP tool registration and invocation flow

The browser acts as a secure proxy. When an agent needs to act, it discovers available tools, picks the right one, and invokes it with structured parameters. No pixel guessing. No DOM parsing. No fragile CSS selectors.

Patrick Brosset, who helped shape the proposal, clarified that the API naming evolved from window.agent to navigator.modelContext. The spec includes requestUserInteraction() for explicit user confirmation before sensitive actions.

Screenshot vs. WebMCP

WebMCP cuts browser agent costs by 89% and raises task accuracy from 85% to 98%. The difference comes down to sending a 200-byte JSON tool schema instead of a 500KB screenshot through a vision model. Here is the full comparison.

Metric	Screenshot Approach	WebMCP
Tokens per interaction	1,500-2,000	~150-200
Token reduction	Baseline	89% fewer
Task accuracy	~85% (best case)	~98%
Computational overhead	Full vision model inference	67% reduction
Latency	300-800ms (screenshot + vision)	<50ms (JSON schema)
Breaks on UI change	Yes, constantly	No (schema-driven)
Handles dynamic content	Poorly	Natively
Auth session access	Requires cookie injection	Inherits user session
Infrastructure needed	Headless browser + vision API	None (browser-native)

The 89% token reduction alone changes the economics. An agent processing 1,000 web interactions daily goes from burning ~1.8M tokens on screenshots to ~180K on structured schemas. At current API pricing, that is real money.

Screenshot agents fail when a button moves, when content loads dynamically, when a modal overlays the target element. WebMCP tools are schema contracts: the website says "I accept these parameters and return these results." UI changes don't break the contract. That is why structured schemas hit 98% task accuracy versus 85% for screenshots.

The Declarative API

You can make existing HTML forms agent-callable with zero JavaScript. Add toolname, tooldescription, and toolparamdescription attributes to your form elements, and Chrome auto-generates a tool schema from the fields. Five attributes, and your forms become tools.

html

<!-- Adding three attributes turns an existing form into a WebMCP tool.
     The browser auto-generates a tool schema from the form fields.
     toolautosubmit lets the agent submit without user clicking. -->
<form
  toolname="searchFlights"
  tooldescription="Search for available flights between two airports"
  toolautosubmit="true"
  action="/api/flights/search"
  method="GET"
>
  <!-- toolparamdescription gives the agent context about each field.
       Without it, the agent only sees the field name "origin". -->
  <label for="origin">Origin Airport</label>
  <input
    name="origin"
    type="text"
    required
    pattern="[A-Z]{3}"
    toolparamdescription="Three-letter IATA airport code (e.g., SFO, JFK, LAX)"
  />
 
  <label for="destination">Destination Airport</label>
  <input
    name="destination"
    type="text"
    required
    pattern="[A-Z]{3}"
    toolparamdescription="Three-letter IATA destination airport code"
  />
 
  <label for="date">Travel Date</label>
  <input name="date" type="date" required />
 
  <!-- min/max constraints become schema validation rules automatically -->
  <label for="passengers">Passengers</label>
  <input name="passengers" type="number" min="1" max="9" value="1" />
 
  <button type="submit">Search Flights</button>
</form>

The browser infers input parameters from form field names and types, then registers a tool that agents can discover and invoke. required becomes a required field in the JSON schema. pattern becomes a validation constraint. min/max on number inputs become numeric bounds.

When an agent invokes searchFlights, the browser fills in the form fields with the provided values. Without toolautosubmit, it waits for the user to click submit. With it, the form submits automatically. This covers the 80% of web interactions that are already form-based: contact forms, search bars, checkout flows, booking systems.

The Imperative API

For dynamic workflows, multi-step processes, or anything requiring JavaScript execution, the Imperative API gives you full control. You register tools with navigator.modelContext.registerTool(), providing a name, JSON schema, and an async execute handler that runs in the page's JavaScript context.

typescript

// Guard against browsers without WebMCP support.
// This check is essential. The API only exists in Chrome 146+ with the flag enabled.
if (!('modelContext' in navigator)) {
  console.log('WebMCP not available in this browser');
  return;
}
 
// Register a tool that adds items to a shopping cart.
// The execute handler runs in the browser context with full access
// to the page's state, DOM, and the user's authenticated session.
navigator.modelContext.registerTool({
  name: "addToCart",
 
  // This description is what the agent reads to decide when to use the tool.
  // Be specific: "Add item" is too vague. The agent won't know what "item" means.
  description: "Add a product to the user's shopping cart by SKU and quantity",
 
  // JSON Schema defines the contract. The agent sends structured params,
  // not free-text. This is why accuracy hits 98%.
  inputSchema: {
    type: "object",
    properties: {
      sku: {
        type: "string",
        description: "Product SKU identifier (e.g., 'SHOE-RED-42')"
      },
      quantity: {
        type: "integer",
        minimum: 1,
        maximum: 10,
        description: "Number of items to add (1-10)"
      }
    },
    required: ["sku", "quantity"]
  },
 
  // The execute handler is an async function.
  // It receives validated params and a client object for user interaction.
  execute: async (params, client) => {
    const { sku, quantity } = params;
 
    // Call your existing cart API. No new backend needed.
    const response = await fetch('/api/cart/add', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ sku, quantity })
    });
 
    const result = await response.json();
 
    // Return structured data the agent can reason about.
    return {
      type: "text",
      text: JSON.stringify({
        success: true,
        cartTotal: result.total,
        itemCount: result.itemCount
      })
    };
  }
});

The handler has access to everything your frontend code already has: the DOM, fetch, localStorage, cookies and session. No separate backend. No API proxy. No headless browser infrastructure. That 500KB screenshot pipeline from the opening? Replaced with a function call.

For tools that modify data or cost money, the client parameter provides requestUserInteraction():

typescript

// For sensitive actions, pause and ask the user before proceeding.
// The browser shows a native confirmation dialog, not a custom modal
// that an agent could dismiss programmatically.
execute: async (params, client) => {
  // requestUserInteraction() pauses agent execution and shows
  // a browser-native prompt. The agent cannot bypass this.
  const confirmed = await client.requestUserInteraction({
    message: `Complete purchase of ${params.itemName} for $${params.price}?`
  });
 
  if (!confirmed) {
    return { type: "text", text: JSON.stringify({ cancelled: true }) };
  }
 
  // Only proceeds after explicit user consent
  const result = await processPayment(params);
  return { type: "text", text: JSON.stringify(result) };
}

You can also provide ambient context without registering a tool. This is useful for giving agents background information about the current page state:

typescript

// provideContext() shares read-only information with agents.
// No tool invocation, no user prompt. Just structured context.
navigator.modelContext.provideContext({
  name: "currentUserProfile",
  description: "The logged-in user's profile and preferences",
  data: {
    name: "Jane Smith",
    tier: "premium",
    preferredLanguage: "en",
    recentOrders: 12
  }
});

WebMCP uses the browser itself as the trust boundary. The browser mediates every tool invocation, requires HTTPS, enforces same-origin isolation, and demands explicit user consent for sensitive actions. The agent never gets more access than the user already has.

HTTPS required. The API only works in secure contexts. No HTTP, no file://, no exceptions.

Same-origin policy. Tools inherit the page's origin boundary. A tool registered on flights.example.com cannot access data from bank.example.com.

User consent is mandatory. For sensitive actions, requestUserInteraction() shows a native browser prompt that the agent cannot dismiss or bypass. The user sees exactly what the agent wants to do and decides whether to allow it.

Session inheritance, not injection. Tools run within the user's existing authenticated session. No cookie injection, no credential passing, but also no access beyond what the user already has.

Per-invocation permissions. The Permission and Consent Manager ensures only tools with explicit user approval can execute. Same pattern as geolocation, camera, and microphone access.

This design mitigates what researchers call the "deadly triad" scenario, where an agent has simultaneous access to multiple sensitive tabs. WebMCP's domain-level isolation means a tool on one origin cannot reach another, even if the same agent interacts with both.

WebMCP vs. Server-Side MCP

WebMCP does not replace server-side MCP. They are complementary. Server-side MCP handles headless backend automation where no browser is present. WebMCP handles browser interactions where the user is logged in and the agent operates within their session. Most production systems will use both.

Dimension	Server-Side MCP	WebMCP
Runs where	Backend server	Browser (client-side)
Protocol	JSON-RPC 2.0	JavaScript API
Transport	Streamable HTTP, stdio	Browser internal
Auth model	OAuth 2.1, API keys	User's browser session
User present	No (headless)	Yes (always)
Primitives	Tools, Resources, Prompts	Tools, Context
Infrastructure	MCP server deployment	Zero (browser-native)
Use case	Service-to-service, backend automation	Browser-based, user-facing

Consider a travel company. It maintains a server-side MCP server for direct API integrations with Claude, ChatGPT, and other platforms. Simultaneously, it implements WebMCP tools on its consumer website so browser-based agents can interact with the booking flow in the user's authenticated session.

The layering looks like this:

WebMCP and server-side MCP complement each other

Server-side MCP for headless automation. WebMCP for user-present browser interactions. Same agent, same tool concepts, different execution environments.

What This Means for Agents

WebMCP shifts the agent-website relationship from adversarial to cooperative. Today, agents scrape, guess, and break. Tomorrow, websites publish tool contracts and agents call them reliably. The screenshot-and-pray approach from our opening is about to become optional.

For agent builders: Your browser automation stack gets simpler. No more maintaining screenshot pipelines, vision model integrations, or brittle CSS selectors. If the target website supports WebMCP, you get structured tool access with guaranteed parameter validation and typed responses. The tool management patterns you already use for server-side MCP translate directly. Your prompt engineering gets simpler too, because you no longer need to describe UI elements in system prompts.

For website owners: WebMCP is the new structured data. Just as Schema.org markup made content machine-readable for search engines, WebMCP makes functionality machine-callable for agents. Agent-ready websites get preferential treatment from agent platforms because they are cheaper (89% fewer tokens) and more reliable (98% accuracy) to interact with.

For platform teams: WebMCP tools should be first-class citizens alongside server-side MCP tools. An agent orchestrating a customer workflow might call a backend MCP server to check inventory, then invoke a WebMCP tool to complete the purchase in the user's session. Your monitoring and analytics need to track both. Scenario testing should cover both tool types.

For the agentic web: Dan Petrovic called WebMCP "the biggest shift in technical SEO since structured data." Agents will prefer sites that publish tool contracts over sites that require screen scraping. The cost differential alone (89% token savings) makes WebMCP-enabled sites the rational choice for every agent platform.

Getting Started Today

WebMCP is available now in Chrome 146 Canary behind a flag. The spec is a draft and breaking changes are expected, but the developer experience is ready for prototyping. Here is the five-step path from zero to a working tool.

Step 1: Enable the flag. Open chrome://flags in Chrome Canary (146+), search for "WebMCP for testing," and enable it.

Step 2: Start with the Declarative API. Pick one form on your site. Add toolname, tooldescription, and toolparamdescription attributes. Verify with the Model Context Tool Inspector Chrome extension.

Step 3: Graduate to the Imperative API. For dynamic interactions, register tools with navigator.modelContext.registerTool(). Always guard with if ('modelContext' in navigator) for graceful fallback.

Step 4: Test with real agents. Connect a browser-based agent (or use Chrome's built-in Gemini integration) and verify that tool discovery, invocation, and result handling work end-to-end.

Step 5: Layer with server-side MCP. Your existing MCP server handles headless workflows. WebMCP handles browser interactions. Same tool catalog, different transports.

The spec lives at github.com/webmachinelearning/webmcp. The W3C Community Group is active and accepting feedback.

WebMCP turns the web from something agents scrape into something agents call. Structured tool contracts are better in every dimension: cost, accuracy, reliability, security. With Google and Microsoft jointly pushing the spec through W3C, adoption is moving fast.

Our agent stopped screenshotting web pages. Yours should too.