ChanlChanl
Agent Architecture

Managed Agents in 2026: Three Runtimes, Three Trade-Offs

Google, Anthropic, and OpenAI all shipped 'managed agents' in May 2026, and they mean completely different things. Here's what each runtime trades away for CX teams.

DGDean GroverCo-founderFollow
May 26, 2026
13 min read
Three Diverging Paths Representing the Google, Anthropic, and OpenAI Managed Agent Runtime Architectures

Three announcements landed in the same stretch of May 2026. Google shipped Managed Agents in the Gemini API at Google I/O. Within days, Anthropic announced what it also called "managed agents," a different product, a different architecture, different trade-offs. OpenAI's Agents SDK had meanwhile pushed a 0.17.x release with its own take on what managed orchestration means.

Same term. Three different things.

If you're building a customer support agent, a sales assistant, or any production CX workflow, you need to understand this split before you commit to a runtime. The choice isn't just about developer experience. It determines what you can observe, what you can control, and what you do when something goes wrong with hundreds of conversations in flight.

Let's walk through all three, compare what you actually gain and lose, and lay out a framework for picking the right one.

What "Managed" Actually Means in Agent Infrastructure

"Managed" means someone else handles the runtime infrastructure: the execution environment, scaling, state management, and potentially the model itself. What varies between Google, Anthropic, and OpenAI is who manages what, and that variation has real downstream consequences for your CX stack.

Think of it on a spectrum from fully hosted to fully self-run.

  • Google: manages the model, the execution sandbox, the built-in tools, and the infrastructure. You send a message; you get a result.
  • Anthropic: manages the connectivity layer (secure tunnels, protocol) while you run execution in your own environment. You own the data path.
  • OpenAI SDK: manages the orchestration loop as a library in your process. You own everything from the execution environment down. The "managed" part is just that you don't write the agent loop from scratch.

Each position on that spectrum has consequences. The further you sit from "fully hosted," the more operational responsibility you carry, and the more visibility you get into what your agent is actually doing.

Your Application Which Runtime? Google Managed Agents Anthropic MCP Tunnels OpenAI Agents SDK Google CloudGemini 3.5 FlashEphemeral Sandbox Your InfraAny Claude ModelYour Sandbox Your Process100 plus LLMsYou Own the Loop Outcomes onlyLimited observability Full tracesFull observability Full tracesFull observability
The three managed agent runtime models in 2026: from fully hosted to fully in-process

Google's Managed Agents: One Call, Cloud Runtime

Google Managed Agents makes a simple promise: a working agent in a single API call. You pass your system instructions and the user's message to the Interactions API; Gemini 3.5 Flash runs the agent in an ephemeral Linux sandbox with pre-loaded tools; you get back the completed result.

google-managed-agent.ts·typescript
import { GoogleGenAI } from "@google/genai";
 
const genai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
 
const response = await genai.agentic.interact({
  model: "gemini-3.5-flash",
  system: `You are a customer support agent for Acme Corp.
You can look up order status, process refunds, and answer product questions.`,
  contents: [{
    role: "user",
    parts: [{ text: "Where is my order #12345 and when will it arrive?" }]
  }]
});
 
console.log(response.finalOutput);
// "Your order #12345 shipped May 24 and is expected May 27..."
console.log(response.toolCalls);
// [{ name: "code_execution", input: "...", output: "..." }]

That's genuinely fast to ship. For an internal tool or a low-stakes prototype, this gets you from idea to running agent in an afternoon.

The trade-offs show up when you move to production.

Your data runs on Google's infrastructure. Every message and tool call executes inside Google's cloud. For a healthcare scheduling agent or a financial services chatbot, this may be a compliance blocker before you even talk to legal.

You see outcomes, not execution. Google returns the final output and a log of tool calls. The reasoning steps, the intermediate state, and the decision path that led from user input to tool selection aren't exposed at the granularity a quality analysis requires. When your agent gives the wrong answer to 3% of queries, "task completed" doesn't help you find why.

You're on Gemini 3.5 Flash for now. For many CX use cases that's fine. It's fast and capable. But if you need Claude for specific reasoning quality, or a different model for multilingual support, you're waiting for Google to expose model choice.

Custom tools are coming. At launch, the pre-loaded tools are code execution, web search, and URL fetch. A customer support agent needs tools like get_order_status, process_refund, and update_shipping_address. The tool extension API is on the roadmap, not in the product yet.

None of these are fatal for a proof of concept. All of them matter if you're targeting production CX at volume.

Anthropic's Managed Agents: Your Infrastructure, Managed Connectivity

With Anthropic's managed agents, the agent runs in your environment and Anthropic handles the connectivity. What that means in practice: MCP tunnels that expose your local servers securely over a persistent tunnel, plus self-hosted sandboxes for tool execution that run in your own infrastructure. Your data never reaches Anthropic's cloud or any third-party compute layer.

MCP tunnels let you expose a locally-running MCP server through a secure, persistent tunnel without a public IP or complex ingress configuration. If you've built an MCP server with your CRM tools, order management APIs, and knowledge base connectors, an MCP tunnel makes that server reachable from Claude without moving any of it to Google's or Anthropic's cloud.

Self-hosted sandboxes give you isolated execution environments for tool calls that you run in your own infrastructure. You define the sandbox spec; Anthropic's runtime manages execution isolation.

The result: your data never leaves. Every tool call, every transcript, every piece of customer context stays in your environment. You can instrument the entire execution path with OpenTelemetry or any tracing pipeline you already run, score against your CX rubric, and plug into your existing compliance stack.

anthropic-mcp-tunnel.ts·typescript
import Anthropic from "@anthropic-ai/sdk";
 
const client = new Anthropic();
 
// Your MCP server is running locally or in your private network.
// The MCP tunnel makes it accessible to Claude without a public IP.
const response = await client.messages.create({
  model: "claude-opus-4-7",
  max_tokens: 1024,
  // MCP tunnel config points Claude at your local server
  mcp_servers: [{
    type: "url",
    url: "https://your-tunnel-id.anthropic-tunnel.com",
    name: "support-tools"
  }],
  messages: [{
    role: "user",
    content: "Process a refund for order #12345"
  }]
});

For regulated industries, this is often the only acceptable path. Healthcare, finance, and legal use cases typically require data to stay within a defined perimeter. Anthropic's self-hosted model meets that bar while still giving you Claude's reasoning quality.

The trade-off is operational surface area. You're running your own MCP server, managing your own execution environment, and handling your own scaling. None of this is hard, but it's more moving pieces than a single API call.

OpenAI's Agents SDK: In-Process, Provider-Agnostic

The OpenAI Agents SDK runs as a library in your own application process, handling the orchestration loop (tool dispatch, context threading, conversation state) while you own the execution environment and the observability. You don't write the agent's reasoning cycle manually, but you control everything below it.

What makes it stand out from the other two: it's provider-agnostic. The same SDK connects to Anthropic, Google, Mistral, Cohere, and 100+ other model providers. You can build your agent once and swap the underlying model without rewriting your tool definitions or your orchestration logic.

openai-agents-sdk-cx.ts·typescript
import OpenAI from "openai";
import { Agent, Runner } from "openai/agents";
 
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
 
const supportAgent = new Agent({
  name: "CX Support Agent",
  instructions: "You are a helpful customer support agent for Acme Corp...",
  tools: [
    getOrderStatus,
    processRefund,
    updateShippingAddress,
    escalateToHuman
  ],
  // Swap to "claude-opus-4-7" or "gemini-3.5-flash" without other changes
  model: "gpt-5"
});
 
const runner = new Runner({ client });
 
// Add a callback to log every tool call
runner.on("tool_call", (event) => {
  logger.info("tool_called", { tool: event.tool, input: event.input });
});
 
const result = await runner.run(supportAgent, "Where is my order #12345?");

Full observability comes for free here. Everything happens in your process, so you add callbacks or middleware wherever you need logs. Tool calls, model requests, completion events: all accessible.

The trade-off: you manage the infrastructure. Hosting, scaling, failover. Those are yours to build and operate. For teams already running application servers, this is a non-issue. For teams that want zero infrastructure overhead, it's the wrong choice.

The Trade-Off Matrix

Here's how the three runtimes compare on the dimensions that matter most for production CX.

DimensionGoogle ManagedAnthropic MCP TunnelsOpenAI Agents SDK
Data locationGoogle cloudYour infrastructureYour process
Model choiceGemini 3.5 FlashAny Claude model100+ providers
Custom toolsComing soonFull (via MCP)Full (function calling)
ObservabilityOutcomes onlyFull execution tracesFull execution traces
Setup complexityOne API callMCP server + tunnelSDK dependency
ScalingGoogle handlesYou handleYou handle
Regulated industriesCheck complianceTypically yesTypically yes
Cost modelPer-interactionPer-tokenPer-token

The observability row catches most teams off guard. The runtime that abstracts the most (Google) gives you the least insight. The runtimes that require more setup give you full access to the execution loop.

What "Managed" Means for CX Agent Monitoring

Managed runtimes don't ship with quality monitoring. Whatever runtime you use, you build that layer yourself, and the runtime you pick determines how much execution data you have to work with.

A CX agent handling a few thousand calls a day needs per-call transcripts with tool call sequences, quality scores against your CX rubric, alerts when quality drops, and the ability to replay failed conversations to find root cause. Those aren't features of any of the three runtimes. They're things you build on top of whichever runtime you pick.

With Google Managed Agents, the gap is largest. You get task completion status and final output. Internal execution traces aren't exposed at launch. If your agent starts handing out wrong refund amounts on 3-5% of queries, you need an external layer to catch that. The pattern that works: wrap the API call, capture inputs and outputs, and score quality from the transcript.

monitor-google-managed-agent.ts·typescript
import { GoogleGenAI } from "@google/genai";
import { Chanl } from "@chanl/sdk";
 
const genai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const chanl = new Chanl({ apiKey: process.env.CHANL_API_KEY });
 
export async function runCXAgent(callId: string, userMessage: string) {
  const startMs = Date.now();
 
  const response = await genai.agentic.interact({
    model: "gemini-3.5-flash",
    system: "You are a customer support agent for Acme Corp...",
    contents: [{ role: "user", parts: [{ text: userMessage }] }]
  });
 
  const durationMs = Date.now() - startMs;
 
  // The Google call already happened. Now log the input/output to a Call
  // record in your own system and score it against your CX rubric.
  // chanl.scorecards.evaluate runs your scorecard over the stored call.
  const evaluation = await chanl.scorecards.evaluate(callId, {
    scorecardId: "customer-support-v2"
  });
 
  console.log({ durationMs, score: evaluation.overall });
  return response.finalOutput;
}

This pattern works with any runtime. The point is: "managed" doesn't mean "monitored." You build the quality loop separately regardless of which runtime handles execution.

For teams using Anthropic's self-hosted approach or the OpenAI SDK, conversation analytics plugs directly into the execution traces you already have, no wrapping layer needed. You get per-call monitoring and alerts at the level the transcript naturally supports.

Choosing the Right Runtime for Your CX Stack

Use Google Managed Agents if you're building a prototype and want results fast, you don't need custom tools yet, your data doesn't have regulatory restrictions, and you're fine with Gemini 3.5 Flash quality for your use case.

Use Anthropic's MCP tunnels and self-hosted sandboxes if your data must stay in your infrastructure, you need full execution traces for quality analysis and compliance, you're already building with MCP and have an MCP server, or you need Claude's specific reasoning quality for complex CX scenarios.

Use the OpenAI Agents SDK in-process if you want to evaluate multiple models without switching SDKs, you need provider portability, you're already running application servers, or you need the most control over the execution loop and tool dispatch.

For most CX teams, the practical path is this. Google Managed for a working demo in hours, then migrate to Anthropic or the OpenAI SDK when you're ready to instrument properly and move to production. The prototype teaches you what your agent needs. The migration is the point where you build for observability.

Before you commit to any runtime, ask yourself one question: "Will I be able to know, a month from now, why my agent made this specific decision on this specific call?" If the answer is no, you're building blind.

The Bigger Picture

The "managed agents" moment of May 2026 matters because all three major labs shipped runtime abstractions in the same stretch of weeks. That signals that agent infrastructure is moving from "interesting experiment" to "production category."

But the naming convergence masks a real architectural split. Google is betting on fully hosted convenience. Anthropic is betting on data sovereignty with managed connectivity. OpenAI is betting on in-process flexibility with provider agnosticism. These aren't versions of the same product. They're different answers to who should own agent execution.

For CX teams, the right answer almost always comes down to observability and data control. You can read more about what production agent observability looks like in our guide to monitoring AI agents, and if you're evaluating which orchestration pattern fits your stack, multi-agent orchestration patterns for 2026 covers the next layer of complexity.

Production agents need to be monitored, scored, and improved over time. The runtime that gives you the clearest path to doing that is the one worth the setup cost.

Monitor Any Managed Agent Runtime

Connect Google Managed Agents, Anthropic MCP tunnels, or the OpenAI SDK to Chanl for scorecards, conversation analytics, and quality alerts on whichever runtime you're on.

Try Chanl Free
DG

Co-founder

Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.

Learn Agentic AI

Weekly. Patterns and recipes for shipping AI agents that actually work — MCP, scorecards, regression tests, prompts, model comparisons. From teams running agents in production.

500+ builders subscribed

Frequently Asked Questions