ChanlChanl
Tools & MCP

Why Browser Agents Waste 89% of Their Tokens

Browser agents burn 1,500-2,000 tokens per screenshot. Chrome 146's navigator.modelContext API lets websites expose structured tools instead, cutting token usage by 89% and raising task accuracy to 98%. Here's how WebMCP works.

DGDean GroverCo-founderFollow
March 20, 2026
13 min read read
Browser window with structured tool definitions flowing between a website and an AI agent

Our agent was screenshotting web pages. Every interaction started with a 500KB PNG, 300ms of vision model processing, and an 85% chance the agent would click the right button. On a good day.

The screenshots ate tokens. A single page capture consumed 1,500-2,000 tokens just to describe what a human could see in a glance. Multiply that by every step in a checkout flow, a flight search, a form submission. Our agent burned through context windows like they were free.

Then navigator.modelContext shipped in Chrome 146 Canary.

Instead of sending a screenshot and asking "what do you see?", the website now tells the agent: "Here are my tools, here are their parameters, here is how to call them." Structured JSON. A few hundred bytes. 89% fewer tokens. 98% task accuracy.

This is WebMCP, and it changes the economics of every browser-based AI agent.

Table of Contents

What WebMCP Actually Is

WebMCP is a browser-native JavaScript API (navigator.modelContext) that lets websites expose structured, callable tools to AI agents. Instead of agents reverse-engineering a page from its DOM or pixels, the page declares what it can do. Jointly developed by Google and Microsoft through the W3C Web Machine Learning Community Group, it shipped as an early preview on February 10, 2026.

registerTool({ name, schema, execute }) Search flights SFO to JFK Available tools: [searchFlights] Invoke searchFlights({ origin: "SFO", dest: "JFK" }) Execute handler with params Return structured result { flights: [...] } Website Browser AI Agent
WebMCP tool registration and invocation flow

The browser acts as a secure proxy. When an agent needs to act, it discovers available tools, picks the right one, and invokes it with structured parameters. No pixel guessing. No DOM parsing. No fragile CSS selectors.

Patrick Brosset, who helped shape the proposal, clarified that the API naming evolved from window.agent to navigator.modelContext. The spec includes requestUserInteraction() for explicit user confirmation before sensitive actions.

Screenshot vs. WebMCP

WebMCP cuts browser agent costs by 89% and raises task accuracy from 85% to 98%. The difference comes down to sending a 200-byte JSON tool schema instead of a 500KB screenshot through a vision model. Here is the full comparison.

MetricScreenshot ApproachWebMCP
Tokens per interaction1,500-2,000~150-200
Token reductionBaseline89% fewer
Task accuracy~85% (best case)~98%
Computational overheadFull vision model inference67% reduction
Latency300-800ms (screenshot + vision)<50ms (JSON schema)
Breaks on UI changeYes, constantlyNo (schema-driven)
Handles dynamic contentPoorlyNatively
Auth session accessRequires cookie injectionInherits user session
Infrastructure neededHeadless browser + vision APINone (browser-native)

The 89% token reduction alone changes the economics. An agent processing 1,000 web interactions daily goes from burning ~1.8M tokens on screenshots to ~180K on structured schemas. At current API pricing, that is real money.

Screenshot agents fail when a button moves, when content loads dynamically, when a modal overlays the target element. WebMCP tools are schema contracts: the website says "I accept these parameters and return these results." UI changes don't break the contract. That is why structured schemas hit 98% task accuracy versus 85% for screenshots.

The Declarative API

You can make existing HTML forms agent-callable with zero JavaScript. Add toolname, tooldescription, and toolparamdescription attributes to your form elements, and Chrome auto-generates a tool schema from the fields. Five attributes, and your forms become tools.

html
<!-- Adding three attributes turns an existing form into a WebMCP tool.
     The browser auto-generates a tool schema from the form fields.
     toolautosubmit lets the agent submit without user clicking. -->
<form
  toolname="searchFlights"
  tooldescription="Search for available flights between two airports"
  toolautosubmit="true"
  action="/api/flights/search"
  method="GET"
>
  <!-- toolparamdescription gives the agent context about each field.
       Without it, the agent only sees the field name "origin". -->
  <label for="origin">Origin Airport</label>
  <input
    name="origin"
    type="text"
    required
    pattern="[A-Z]{3}"
    toolparamdescription="Three-letter IATA airport code (e.g., SFO, JFK, LAX)"
  />
 
  <label for="destination">Destination Airport</label>
  <input
    name="destination"
    type="text"
    required
    pattern="[A-Z]{3}"
    toolparamdescription="Three-letter IATA destination airport code"
  />
 
  <label for="date">Travel Date</label>
  <input name="date" type="date" required />
 
  <!-- min/max constraints become schema validation rules automatically -->
  <label for="passengers">Passengers</label>
  <input name="passengers" type="number" min="1" max="9" value="1" />
 
  <button type="submit">Search Flights</button>
</form>

The browser infers input parameters from form field names and types, then registers a tool that agents can discover and invoke. required becomes a required field in the JSON schema. pattern becomes a validation constraint. min/max on number inputs become numeric bounds.

When an agent invokes searchFlights, the browser fills in the form fields with the provided values. Without toolautosubmit, it waits for the user to click submit. With it, the form submits automatically. This covers the 80% of web interactions that are already form-based: contact forms, search bars, checkout flows, booking systems.

The Imperative API

For dynamic workflows, multi-step processes, or anything requiring JavaScript execution, the Imperative API gives you full control. You register tools with navigator.modelContext.registerTool(), providing a name, JSON schema, and an async execute handler that runs in the page's JavaScript context.

typescript
// Guard against browsers without WebMCP support.
// This check is essential. The API only exists in Chrome 146+ with the flag enabled.
if (!('modelContext' in navigator)) {
  console.log('WebMCP not available in this browser');
  return;
}
 
// Register a tool that adds items to a shopping cart.
// The execute handler runs in the browser context with full access
// to the page's state, DOM, and the user's authenticated session.
navigator.modelContext.registerTool({
  name: "addToCart",
 
  // This description is what the agent reads to decide when to use the tool.
  // Be specific: "Add item" is too vague. The agent won't know what "item" means.
  description: "Add a product to the user's shopping cart by SKU and quantity",
 
  // JSON Schema defines the contract. The agent sends structured params,
  // not free-text. This is why accuracy hits 98%.
  inputSchema: {
    type: "object",
    properties: {
      sku: {
        type: "string",
        description: "Product SKU identifier (e.g., 'SHOE-RED-42')"
      },
      quantity: {
        type: "integer",
        minimum: 1,
        maximum: 10,
        description: "Number of items to add (1-10)"
      }
    },
    required: ["sku", "quantity"]
  },
 
  // The execute handler is an async function.
  // It receives validated params and a client object for user interaction.
  execute: async (params, client) => {
    const { sku, quantity } = params;
 
    // Call your existing cart API. No new backend needed.
    const response = await fetch('/api/cart/add', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ sku, quantity })
    });
 
    const result = await response.json();
 
    // Return structured data the agent can reason about.
    return {
      type: "text",
      text: JSON.stringify({
        success: true,
        cartTotal: result.total,
        itemCount: result.itemCount
      })
    };
  }
});

The handler has access to everything your frontend code already has: the DOM, fetch, localStorage, cookies and session. No separate backend. No API proxy. No headless browser infrastructure. That 500KB screenshot pipeline from the opening? Replaced with a function call.

For tools that modify data or cost money, the client parameter provides requestUserInteraction():

typescript
// For sensitive actions, pause and ask the user before proceeding.
// The browser shows a native confirmation dialog, not a custom modal
// that an agent could dismiss programmatically.
execute: async (params, client) => {
  // requestUserInteraction() pauses agent execution and shows
  // a browser-native prompt. The agent cannot bypass this.
  const confirmed = await client.requestUserInteraction({
    message: `Complete purchase of ${params.itemName} for $${params.price}?`
  });
 
  if (!confirmed) {
    return { type: "text", text: JSON.stringify({ cancelled: true }) };
  }
 
  // Only proceeds after explicit user consent
  const result = await processPayment(params);
  return { type: "text", text: JSON.stringify(result) };
}

You can also provide ambient context without registering a tool. This is useful for giving agents background information about the current page state:

typescript
// provideContext() shares read-only information with agents.
// No tool invocation, no user prompt. Just structured context.
navigator.modelContext.provideContext({
  name: "currentUserProfile",
  description: "The logged-in user's profile and preferences",
  data: {
    name: "Jane Smith",
    tier: "premium",
    preferredLanguage: "en",
    recentOrders: 12
  }
});

WebMCP uses the browser itself as the trust boundary. The browser mediates every tool invocation, requires HTTPS, enforces same-origin isolation, and demands explicit user consent for sensitive actions. The agent never gets more access than the user already has.

HTTPS required. The API only works in secure contexts. No HTTP, no file://, no exceptions.

Same-origin policy. Tools inherit the page's origin boundary. A tool registered on flights.example.com cannot access data from bank.example.com.

User consent is mandatory. For sensitive actions, requestUserInteraction() shows a native browser prompt that the agent cannot dismiss or bypass. The user sees exactly what the agent wants to do and decides whether to allow it.

Session inheritance, not injection. Tools run within the user's existing authenticated session. No cookie injection, no credential passing, but also no access beyond what the user already has.

Per-invocation permissions. The Permission and Consent Manager ensures only tools with explicit user approval can execute. Same pattern as geolocation, camera, and microphone access.

This design mitigates what researchers call the "deadly triad" scenario, where an agent has simultaneous access to multiple sensitive tabs. WebMCP's domain-level isolation means a tool on one origin cannot reach another, even if the same agent interacts with both.

WebMCP vs. Server-Side MCP

WebMCP does not replace server-side MCP. They are complementary. Server-side MCP handles headless backend automation where no browser is present. WebMCP handles browser interactions where the user is logged in and the agent operates within their session. Most production systems will use both.

DimensionServer-Side MCPWebMCP
Runs whereBackend serverBrowser (client-side)
ProtocolJSON-RPC 2.0JavaScript API
TransportStreamable HTTP, stdioBrowser internal
Auth modelOAuth 2.1, API keysUser's browser session
User presentNo (headless)Yes (always)
PrimitivesTools, Resources, PromptsTools, Context
InfrastructureMCP server deploymentZero (browser-native)
Use caseService-to-service, backend automationBrowser-based, user-facing

Consider a travel company. It maintains a server-side MCP server for direct API integrations with Claude, ChatGPT, and other platforms. Simultaneously, it implements WebMCP tools on its consumer website so browser-based agents can interact with the booking flow in the user's authenticated session.

The layering looks like this:

Backend (Server-Side MCP) Browser (WebMCP) AI Agent User Session MCP Server Flight Search API Booking API Payment API navigator.modelContext Search Form Tool Add to Cart Tool Checkout Tool
WebMCP and server-side MCP complement each other

Server-side MCP for headless automation. WebMCP for user-present browser interactions. Same agent, same tool concepts, different execution environments.

What This Means for Agents

WebMCP shifts the agent-website relationship from adversarial to cooperative. Today, agents scrape, guess, and break. Tomorrow, websites publish tool contracts and agents call them reliably. The screenshot-and-pray approach from our opening is about to become optional.

For agent builders: Your browser automation stack gets simpler. No more maintaining screenshot pipelines, vision model integrations, or brittle CSS selectors. If the target website supports WebMCP, you get structured tool access with guaranteed parameter validation and typed responses. The tool management patterns you already use for server-side MCP translate directly. Your prompt engineering gets simpler too, because you no longer need to describe UI elements in system prompts.

For website owners: WebMCP is the new structured data. Just as Schema.org markup made content machine-readable for search engines, WebMCP makes functionality machine-callable for agents. Agent-ready websites get preferential treatment from agent platforms because they are cheaper (89% fewer tokens) and more reliable (98% accuracy) to interact with.

For platform teams: WebMCP tools should be first-class citizens alongside server-side MCP tools. An agent orchestrating a customer workflow might call a backend MCP server to check inventory, then invoke a WebMCP tool to complete the purchase in the user's session. Your monitoring and analytics need to track both. Scenario testing should cover both tool types.

For the agentic web: Dan Petrovic called WebMCP "the biggest shift in technical SEO since structured data." Agents will prefer sites that publish tool contracts over sites that require screen scraping. The cost differential alone (89% token savings) makes WebMCP-enabled sites the rational choice for every agent platform.

Getting Started Today

WebMCP is available now in Chrome 146 Canary behind a flag. The spec is a draft and breaking changes are expected, but the developer experience is ready for prototyping. Here is the five-step path from zero to a working tool.

Step 1: Enable the flag. Open chrome://flags in Chrome Canary (146+), search for "WebMCP for testing," and enable it.

Step 2: Start with the Declarative API. Pick one form on your site. Add toolname, tooldescription, and toolparamdescription attributes. Verify with the Model Context Tool Inspector Chrome extension.

Step 3: Graduate to the Imperative API. For dynamic interactions, register tools with navigator.modelContext.registerTool(). Always guard with if ('modelContext' in navigator) for graceful fallback.

Step 4: Test with real agents. Connect a browser-based agent (or use Chrome's built-in Gemini integration) and verify that tool discovery, invocation, and result handling work end-to-end.

Step 5: Layer with server-side MCP. Your existing MCP server handles headless workflows. WebMCP handles browser interactions. Same tool catalog, different transports.

The spec lives at github.com/webmachinelearning/webmcp. The W3C Community Group is active and accepting feedback.


WebMCP turns the web from something agents scrape into something agents call. Structured tool contracts are better in every dimension: cost, accuracy, reliability, security. With Google and Microsoft jointly pushing the spec through W3C, adoption is moving fast.

Our agent stopped screenshotting web pages. Yours should too.

Build agents with production tool infrastructure

Chanl gives your AI agents managed tools, MCP integration, and monitoring. When WebMCP-enabled websites become the norm, your agents will be ready.

Explore Chanl Tools
DG

Co-founder

Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.

Learn Agentic AI

One lesson a week — practical techniques for building, testing, and shipping AI agents. From prompt engineering to production monitoring. Learn by doing.

500+ engineers subscribed

Frequently Asked Questions