Your agent just pulled up eight available booking slots. The user needs to pick one. So your agent returns a numbered list, the user types "option 3," the agent parses their intent, and if you're lucky, nothing goes wrong in that chain.
That's the current state of tool responses in AI chat: data that could be structured ends up as prose, which the user then has to re-encode as more prose. MCP Apps, launched January 26, 2026 as the first official MCP extension, fix this. Tools can now return interactive HTML that renders directly in the conversation window.
It's not experimental. It shipped with Claude, Goose, and VS Code on day one, with nine production launch partners, and ChatGPT rolled out support shortly after. Here's how to build your first one.
What the problem actually is
Text is a lossy channel for structured interactions. When a CX agent needs to collect a specific piece of information from a user -- a date, a choice from a defined set, a confirmation of proposed changes -- turning that into a natural language exchange forces the model to parse intent instead of receiving data.
The failure modes are familiar. The user says "morning" and the agent picks 9am when they meant 11am. The user confirms an order change but their phrasing is ambiguous and the agent asks for clarification. The user types a number but forgets the agent's list was zero-indexed. Each of these is a friction point that erodes trust in the agent and generates extra conversational turns.
MCP Apps give you a second return channel alongside the normal tool response. Instead of describing structured data as text, you attach a UI resource that the host renders as an interactive interface. The user interacts with it directly, and their input flows back into the model's context as clean, structured data. No parsing. No ambiguity.
The two primitives you need to understand
MCP Apps add two things to a standard MCP server: a _meta.ui.resourceUri field on tool responses, and a ui:// resource handler that serves your HTML. That's it. Your existing tool logic, schemas, and transport don't change. The new behavior is purely additive -- you opt in per tool, per response.
Tools with UI metadata. Your existing tool definitions gain an optional _meta.ui.resourceUri field on the response object. This is a URI pointing to a UI resource your server knows how to serve. The standard tool response still goes in content -- that's what the LLM reasons from. The UI resource is for the human.
server.tool(
"get_booking_slots",
"Returns available booking slots for a customer",
{ customerId: z.string() },
async ({ customerId }) => {
const slots = await db.getAvailableSlots(customerId);
return {
content: [
{
type: "text",
text: `Found ${slots.length} available slots for ${customerId}. Rendering slot picker for user selection.`,
},
],
_meta: {
ui: {
resourceUri: `ui://booking-slots/${customerId}`,
},
},
};
}
);UI resources via the ui:// scheme. Your server registers a resource handler for the ui:// scheme that returns bundled HTML and JavaScript. When the host sees a _meta.ui.resourceUri, it fetches that resource from your server and renders it in a sandboxed iframe.
server.resource(
"booking-slots-ui",
new ResourceTemplate("ui://booking-slots/{customerId}", { list: undefined }),
async (uri, { customerId }) => {
const slots = await db.getAvailableSlots(customerId as string);
const html = buildBookingUI(slots);
return {
contents: [
{
uri: uri.href,
mimeType: "text/html",
text: html,
},
],
};
}
);The host handles fetching, sandboxing, and rendering. Your job is to serve valid HTML for the ui:// URI your tool pointed at.
How the message flow works
When a tool returns a UI resource, the host fetches the HTML, renders it in a sandboxed iframe in the conversation, and establishes a bidirectional JSON-RPC channel. The LLM gets the text result immediately without waiting. When the user interacts with the UI, their selection flows back through updateModelContext as structured data, not as a new chat message.
Two things here worth paying attention to. First, the LLM gets the text result immediately without waiting for the UI to load. Rendering is async and non-blocking. Second, when the user interacts with the UI, their selection doesn't come back as a chat message. It flows through updateModelContext, which injects structured content directly into the model's context window. The model receives a JSON object with the selected slot ID and timestamp, not the string "Tuesday 2pm" that it would then have to parse.
This is the real win. The agent's next step gets clean data to work with.
Building the client-side UI
Inside your bundled HTML, the @modelcontextprotocol/ext-apps package gives you the communication layer. You don't manage postMessage directly.
import { App } from "@modelcontextprotocol/ext-apps";
interface Slot {
id: string;
startTime: string;
durationMinutes: number;
label: string;
}
const app = new App();
async function initialize() {
await app.connect();
// Handle results from server tools called through the host
app.ontoolresult = (result) => {
if (result.toolName === "confirm_booking") {
renderConfirmation(result.data);
}
};
renderSlotPicker(getInitialSlots());
}
async function handleSlotSelection(slot: Slot) {
// Show a confirmation step before committing
renderConfirmationPrompt(slot);
}
async function confirmSelection(slot: Slot) {
// Push the structured selection into the model's context
await app.updateModelContext({
content: [
{
type: "text",
text: JSON.stringify({
action: "slot_selected",
slotId: slot.id,
startTime: slot.startTime,
durationMinutes: slot.durationMinutes,
}),
},
],
});
}
async function callServerTool(slotId: string) {
// Call a server-side tool through the host (user will see an approval prompt)
const result = await app.callServerTool({
name: "reserve_slot",
arguments: { slotId },
});
return result;
}
initialize();A few patterns worth noting. app.connect() establishes the postMessage channel to the host. app.updateModelContext() is how you feed structured data back to the LLM without a user message. app.callServerTool() lets you call server-side tools from within the UI, and the host will show the user an approval prompt for tools that have side effects.
Your UI is responsible for making this approval flow clear. Don't call app.callServerTool() silently. Show the user what you're about to do, let them confirm, and handle rejection gracefully.
The buildBookingUI function
The UI resource is plain HTML. You generate it server-side with your slot data already embedded. Here's a minimal implementation:
export function buildBookingUI(slots: Slot[]): string {
const slotButtons = slots
.map(
(slot) => `
<button
class="slot-btn"
onclick="selectSlot(${JSON.stringify(slot)})"
>
${slot.label}
</button>
`
)
.join("");
return `<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<script src="https://cdn.mcp-apps.io/ext-apps.js"></script>
<style>
body { font-family: system-ui; margin: 0; padding: 16px; }
.slot-grid { display: grid; grid-template-columns: repeat(2, 1fr); gap: 8px; }
.slot-btn {
padding: 10px 14px; border: 1px solid #ddd; border-radius: 6px;
background: white; cursor: pointer; text-align: left;
transition: background 0.15s;
}
.slot-btn:hover { background: #f5f5f5; }
.slot-btn.selected { background: #c2724a; color: white; border-color: #c2724a; }
</style>
</head>
<body>
<p style="margin-top:0; font-size:14px; color:#666;">Select a time slot:</p>
<div class="slot-grid">${slotButtons}</div>
<script>
const app = new MCPApps.App();
app.connect();
function selectSlot(slot) {
document.querySelectorAll('.slot-btn').forEach(b => b.classList.remove('selected'));
event.target.classList.add('selected');
app.updateModelContext({
content: [{
type: 'text',
text: JSON.stringify({ action: 'slot_selected', ...slot })
}]
});
}
</script>
</body>
</html>`;
}This is minimal but functional. In production you'd bundle your UI separately and serve it as a static asset rather than generating HTML strings server-side. The important constraint is that the UI must be self-contained -- no external resources that aren't pre-declared -- to pass the host's security review.
Security: what the sandbox actually means
MCP Apps are secure by design: your UI runs in a sandboxed iframe with restricted permissions, all UI-to-host communication goes through auditable JSON-RPC over postMessage, and your server pre-declares every template it might render so hosts can inspect them before connecting. User approval gates any tool call with side effects. Here's what each of those guarantees actually means in practice.
The iframe runs with strict sandbox attributes. Your UI code can't access the parent window's DOM, can't read cookies or local storage from the host origin, and can't make arbitrary fetch calls to third-party URLs. All communication goes through the JSON-RPC postMessage channel.
Pre-declared templates. Your server registers templates -- URI patterns like ui://booking-slots/{customerId} -- that it might serve. Hosts (and enterprise policy engines, and eventually marketplace reviews) can inspect these before the user connects. There are no surprise UI resources.
Approval for side effects. When your UI calls app.callServerTool(), the host shows the user an explicit approval prompt for tools tagged as having side effects. Your UI should be designed with this in mind. Show the user what action they're about to approve before you call the tool, not after.
Auditable messages. Every JSON-RPC message between the UI and the host is logged by the host. If you're running an enterprise deployment, you can audit what UI-initiated tool calls were made and what context updates were pushed, alongside the normal conversation transcript.
For CX deployments specifically, this matters because the trust surface has expanded. You're not just trusting the agent's text responses now -- you're trusting rendered code from an MCP server. Make sure the servers you're connecting to are ones you've vetted.
Real CX use cases that benefit most
The highest-value targets for MCP Apps are CX interactions where the user is currently encoding structured intent as natural language -- selecting from a defined set, confirming proposed changes, or choosing from a fixed list. These are the spots where text is a workaround for a UI, and MCP Apps let you ship the UI directly.
Here are the clearest wins.
Scheduling and booking. The classic case. Render a calendar or time slot grid. The user clicks a slot, and the agent receives a structured timestamp with the slot ID. No parsing of "Wednesday morning if possible."
Order status and returns. A returns agent can surface a compact order card -- line items, shipping status, estimated refund -- with a "Confirm Return" button. The customer doesn't need to navigate away from the chat to see the order details they're discussing.
Policy confirmation. When an agent proposes an account change (updating a shipping address, applying a discount, processing a credit), render a review card showing the before and after state. The user confirms with a button. The agent receives a structured confirmation event, not the user typing "yes that's right."
Dynamic knowledge cards. When your support agent pulls product documentation, it can render an interactive card with expandable sections, copyable configuration snippets, and direct links to the relevant settings in your product. The conversation becomes the UI surface.
Each of these replaces a multi-turn text exchange with a single purposeful interface. The agent gets clean structured data. The user gets something they can interact with rather than describe.
Testing your MCP App tools
MCP Apps introduce a new testing dimension: you need to verify that the UI response, not just the text response, behaves correctly. Here's what to add to your test suite.
import { McpTestClient } from "@modelcontextprotocol/sdk/test";
import { server } from "../server";
describe("get_booking_slots", () => {
let client: McpTestClient;
beforeEach(async () => {
client = new McpTestClient(server);
await client.connect();
});
it("returns a text result and a UI resource URI", async () => {
const result = await client.callTool("get_booking_slots", {
customerId: "cust-123",
});
expect(result.content[0].type).toBe("text");
expect(result.content[0].text).toContain("slot");
expect(result._meta?.ui?.resourceUri).toMatch(/^ui:\/\/booking-slots\//);
});
it("serves valid HTML for the UI resource", async () => {
const resource = await client.readResource("ui://booking-slots/cust-123");
const html = resource.contents[0].text;
expect(html).toContain("MCPApps.App");
expect(html).toContain("updateModelContext");
expect(html).toContain("slot-btn");
});
it("degrades gracefully with no slots available", async () => {
// Mock empty slot set
db.mockSlots("cust-no-slots", []);
const result = await client.callTool("get_booking_slots", {
customerId: "cust-no-slots",
});
// Text result should still be useful
expect(result.content[0].text).toContain("no available slots");
// UI resource can still be returned (renders empty state)
expect(result._meta?.ui?.resourceUri).toBeDefined();
});
});The key things to test: that the text result is useful on its own (for non-supporting clients), that the UI resource URI is present, that the UI HTML is valid and includes the necessary MCP Apps SDK calls, and that the empty/error states render something sensible.
For teams using Chanl's scenario testing alongside MCP tool validation, you can define expected tool call patterns that include UI resource assertions. If a scenario expects get_booking_slots to return a UI resource and your server stops returning one, that's a regression worth catching before it reaches production.
Connecting UI interactions to production monitoring
Most conversation monitoring tools track what the agent said and what the user typed. MCP Apps create interaction events that currently fall through the gap: did the user open the UI, make a selection, abandon it, or never engage with it at all?
These are high-signal events. A user who opens a booking UI and closes it without selecting anything is telling you something is wrong with the slots you're offering. A user who clicks the same slot button three times before it registers might have a timing bug in your updateModelContext call.
You can capture this in your MCP server. Log every ui:// resource fetch, and log updateModelContext calls from the UI back to the host (these will show up in your server's tool call logs if you instrument them). Pair these events with the session ID from the conversation, and you have UI interaction data attached to conversation traces.
For teams using Chanl analytics or monitoring dashboards to track agent performance, UI interaction events are worth adding to your event pipeline alongside conversation data. They surface patterns that pure text analysis misses -- especially abandonment, hesitation, and interaction failure rates.
This is a new signal that most teams don't have yet. The teams that instrument it first will see things about their agent's behavior that their competitors can't.
What happens in clients that don't support MCP Apps yet
Your existing MCP clients keep working exactly as they did. Non-supporting clients see only the content array of your tool response and skip _meta.ui.resourceUri entirely. Your tool should always return a useful text response regardless of whether the UI renders.
This is the right constraint to design with from the start. Write the text response as if the UI doesn't exist -- it should be complete and actionable on its own. Then add the UI resource as an enhancement that makes the experience better in supporting clients.
The practical result is that you don't need two separate tool definitions. One tool, one implementation. Clients that support MCP Apps get the interface. Clients that don't get the text. Your server doesn't need to know which type of client it's talking to.
The building blocks for this are the same as any MCP server. If you're earlier in the MCP journey, building your first MCP server covers the foundation. Advanced tool integration patterns covers the production concerns like caching, error handling, and schema validation that you'll want solid before adding UI resources on top.
Where to start
Pick one tool in your current MCP server where users are typing responses that the agent has to parse. Scheduling, confirmation, option selection -- any of these works. Build a minimal UI resource for that tool, test that it degrades gracefully for non-supporting clients, and ship it to supporting clients.
The underlying framework is the same whether you're using MCP Apps or plain tool calls: you build the tool and its UI resource, connect it to your agent's runtime, and monitor what users actually do with it in production. Build, connect, and monitor -- the three steps don't change, but MCP Apps make the "connect" step richer for any interaction that benefits from a purpose-built interface.
The MCP tools feature for Chanl-connected agents works on top of standard MCP server implementations, so MCP App-capable servers plug in alongside your existing tool setup. You get production monitoring for tool call patterns and can layer UI interaction events into the same pipeline.
MCP Apps don't change what your agent can do. They change how well it communicates with the humans it's helping. The agent logic stays the same. The user interaction gets a lot better.
Build and monitor agents that use MCP tools
Chanl gives your agents MCP tool access, persistent memory, and production monitoring in one platform -- so you can build the backend without starting from scratch.
Start Building Free- MCP Apps announcement -- Model Context Protocol Blog, January 26 2026
- MCP Apps in VS Code Insiders -- Visual Studio Code Blog, January 2026
- MCP 2026 Roadmap -- Model Context Protocol
- MCP Apps now in Copilot Chat -- Microsoft 365 Developer Blog
- The MCP 2026 Roadmap: Everything Changing for Developers -- MCP Playground
Co-founder
Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.
The Signal Briefing
One email a week. How leading CS, revenue, and AI teams are turning conversations into decisions. Benchmarks, playbooks, and what's working in production.



