What Are Managed Agents in 2026?

Managed agents are agent runtimes where some or all of the infrastructure is handled by a provider. In 2026, Google, Anthropic, and OpenAI each launched a managed agent product, but they mean different things: Google hosts everything in its cloud, Anthropic manages the connectivity while you keep your infra, and OpenAI manages the orchestration loop as a library running in your own process.

What Is Google's Managed Agents API and How Does It Work?

Google's Managed Agents API, announced around Google I/O 2026, lets you deploy a Gemini 3.5 Flash agent with a single Interactions API call. The agent runs in Google's cloud with pre-loaded tools (code execution, web search, URL fetch) in an ephemeral Linux sandbox. You pass instructions and the user's message; you get back the completed task result. Execution runs entirely on Google's infrastructure.

How Does Anthropic's Managed Agents Approach Differ From Google's?

Anthropic's managed agents run on your own infrastructure. Anthropic provides MCP tunnels, which are secure persistent tunnels that expose local MCP servers without a public IP, plus self-hosted sandboxes for tool execution. Your data stays in your environment, you get full observability, and you can use any Claude model. The trade-off is more setup compared to a single API call.

Can I Monitor a Managed Agent With Third-Party Tools?

Yes, but the approach depends on the runtime. With Anthropic's self-hosted approach and the OpenAI Agents SDK, you have full access to execution traces and can push data to any monitoring pipeline. With Google Managed Agents, execution runs in Google's cloud, so you need to wrap the API call and score quality externally from inputs and outputs.

Which Managed Agent Runtime Is Best for Regulated Industries?

Anthropic's approach (self-hosted sandboxes and MCP tunnels) or the OpenAI Agents SDK running in your own environment are better choices for regulated industries like healthcare and finance. Both keep data in your infrastructure and support full audit trails. Google Managed Agents sends execution to Google's cloud, which may not meet compliance requirements for sensitive customer data.

What Are the Cost Differences Between Hosted and Self-Hosted Managed Agents?

Google Managed Agents charges per Interactions API call with inference included. Self-hosted approaches charge per token used through the model API. For high-volume CX workloads, self-hosted is often cheaper because you can optimize model selection, use prompt caching, and avoid per-call infrastructure overhead. For prototypes or low-volume use, hosted is simpler and the pricing is more predictable.

How Do Managed Agents Handle Multi-Turn CX Conversations?

Google Managed Agents supports multi-turn within a session via the Interactions API but doesn't provide built-in persistent memory across sessions. Self-hosted approaches give you full control over both session state and long-term memory. For CX agents that need to remember a customer's history across multiple contacts, a dedicated memory layer is essential regardless of which runtime you pick.

What Did Google Announce at I/O 2026 Related to AI Agents?

Google I/O 2026 included several major agent announcements: Gemini 3.5 Flash combining frontier-model quality with high speed, Managed Agents in the Gemini API for single-call deployment, Antigravity 2.0 as an agent-first desktop development platform that orchestrates multiple agents in parallel, and WebMCP as a proposed open web standard that lets browser-based AI agents invoke JavaScript functions and HTML forms as structured tools.

Managed Agents in 2026: Three Runtimes, Three Trade-Offs

Three announcements landed in the same stretch of May 2026. Google shipped Managed Agents in the Gemini API at Google I/O. Within days, Anthropic announced what it also called "managed agents," a different product, a different architecture, different trade-offs. OpenAI's Agents SDK had meanwhile pushed a 0.17.x release with its own take on what managed orchestration means.

Same term. Three different things.

If you're building a customer support agent, a sales assistant, or any production CX workflow, you need to understand this split before you commit to a runtime. The choice isn't just about developer experience. It determines what you can observe, what you can control, and what you do when something goes wrong with hundreds of conversations in flight.

Let's walk through all three, compare what you actually gain and lose, and lay out a framework for picking the right one.

What "Managed" Actually Means in Agent Infrastructure

"Managed" means someone else handles the runtime infrastructure: the execution environment, scaling, state management, and potentially the model itself. What varies between Google, Anthropic, and OpenAI is who manages what, and that variation has real downstream consequences for your CX stack.

Think of it on a spectrum from fully hosted to fully self-run.

Google: manages the model, the execution sandbox, the built-in tools, and the infrastructure. You send a message; you get a result.
Anthropic: manages the connectivity layer (secure tunnels, protocol) while you run execution in your own environment. You own the data path.
OpenAI SDK: manages the orchestration loop as a library in your process. You own everything from the execution environment down. The "managed" part is just that you don't write the agent loop from scratch.

Each position on that spectrum has consequences. The further you sit from "fully hosted," the more operational responsibility you carry, and the more visibility you get into what your agent is actually doing.

The three managed agent runtime models in 2026: from fully hosted to fully in-process

Google's Managed Agents: One Call, Cloud Runtime

Google Managed Agents makes a simple promise: a working agent in a single API call. You pass your system instructions and the user's message to the Interactions API; Gemini 3.5 Flash runs the agent in an ephemeral Linux sandbox with pre-loaded tools; you get back the completed result.

google-managed-agent.ts·typescript

import { GoogleGenAI } from "@google/genai";
 
const genai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
 
const response = await genai.agentic.interact({
  model: "gemini-3.5-flash",
  system: `You are a customer support agent for Acme Corp.
You can look up order status, process refunds, and answer product questions.`,
  contents: [{
    role: "user",
    parts: [{ text: "Where is my order #12345 and when will it arrive?" }]
  }]
});
 
console.log(response.finalOutput);
// "Your order #12345 shipped May 24 and is expected May 27..."
console.log(response.toolCalls);
// [{ name: "code_execution", input: "...", output: "..." }]

That's genuinely fast to ship. For an internal tool or a low-stakes prototype, this gets you from idea to running agent in an afternoon.

The trade-offs show up when you move to production.

Your data runs on Google's infrastructure. Every message and tool call executes inside Google's cloud. For a healthcare scheduling agent or a financial services chatbot, this may be a compliance blocker before you even talk to legal.

You see outcomes, not execution. Google returns the final output and a log of tool calls. The reasoning steps, the intermediate state, and the decision path that led from user input to tool selection aren't exposed at the granularity a quality analysis requires. When your agent gives the wrong answer to 3% of queries, "task completed" doesn't help you find why.

You're on Gemini 3.5 Flash for now. For many CX use cases that's fine. It's fast and capable. But if you need Claude for specific reasoning quality, or a different model for multilingual support, you're waiting for Google to expose model choice.

Custom tools are coming. At launch, the pre-loaded tools are code execution, web search, and URL fetch. A customer support agent needs tools like get_order_status, process_refund, and update_shipping_address. The tool extension API is on the roadmap, not in the product yet.

None of these are fatal for a proof of concept. All of them matter if you're targeting production CX at volume.

Anthropic's Managed Agents: Your Infrastructure, Managed Connectivity

With Anthropic's managed agents, the agent runs in your environment and Anthropic handles the connectivity. What that means in practice: MCP tunnels that expose your local servers securely over a persistent tunnel, plus self-hosted sandboxes for tool execution that run in your own infrastructure. Your data never reaches Anthropic's cloud or any third-party compute layer.

MCP tunnels let you expose a locally-running MCP server through a secure, persistent tunnel without a public IP or complex ingress configuration. If you've built an MCP server with your CRM tools, order management APIs, and knowledge base connectors, an MCP tunnel makes that server reachable from Claude without moving any of it to Google's or Anthropic's cloud.

Self-hosted sandboxes give you isolated execution environments for tool calls that you run in your own infrastructure. You define the sandbox spec; Anthropic's runtime manages execution isolation.

The result: your data never leaves. Every tool call, every transcript, every piece of customer context stays in your environment. You can instrument the entire execution path with OpenTelemetry or any tracing pipeline you already run, score against your CX rubric, and plug into your existing compliance stack.

anthropic-mcp-tunnel.ts·typescript

import Anthropic from "@anthropic-ai/sdk";
 
const client = new Anthropic();
 
// Your MCP server is running locally or in your private network.
// The MCP tunnel makes it accessible to Claude without a public IP.
const response = await client.messages.create({
  model: "claude-opus-4-7",
  max_tokens: 1024,
  // MCP tunnel config points Claude at your local server
  mcp_servers: [{
    type: "url",
    url: "https://your-tunnel-id.anthropic-tunnel.com",
    name: "support-tools"
  }],
  messages: [{
    role: "user",
    content: "Process a refund for order #12345"
  }]
});

For regulated industries, this is often the only acceptable path. Healthcare, finance, and legal use cases typically require data to stay within a defined perimeter. Anthropic's self-hosted model meets that bar while still giving you Claude's reasoning quality.

The trade-off is operational surface area. You're running your own MCP server, managing your own execution environment, and handling your own scaling. None of this is hard, but it's more moving pieces than a single API call.

OpenAI's Agents SDK: In-Process, Provider-Agnostic

The OpenAI Agents SDK runs as a library in your own application process, handling the orchestration loop (tool dispatch, context threading, conversation state) while you own the execution environment and the observability. You don't write the agent's reasoning cycle manually, but you control everything below it.

What makes it stand out from the other two: it's provider-agnostic. The same SDK connects to Anthropic, Google, Mistral, Cohere, and 100+ other model providers. You can build your agent once and swap the underlying model without rewriting your tool definitions or your orchestration logic.

openai-agents-sdk-cx.ts·typescript

import OpenAI from "openai";
import { Agent, Runner } from "openai/agents";
 
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
 
const supportAgent = new Agent({
  name: "CX Support Agent",
  instructions: "You are a helpful customer support agent for Acme Corp...",
  tools: [
    getOrderStatus,
    processRefund,
    updateShippingAddress,
    escalateToHuman
  ],
  // Swap to "claude-opus-4-7" or "gemini-3.5-flash" without other changes
  model: "gpt-5"
});
 
const runner = new Runner({ client });
 
// Add a callback to log every tool call
runner.on("tool_call", (event) => {
  logger.info("tool_called", { tool: event.tool, input: event.input });
});
 
const result = await runner.run(supportAgent, "Where is my order #12345?");

Full observability comes for free here. Everything happens in your process, so you add callbacks or middleware wherever you need logs. Tool calls, model requests, completion events: all accessible.

The trade-off: you manage the infrastructure. Hosting, scaling, failover. Those are yours to build and operate. For teams already running application servers, this is a non-issue. For teams that want zero infrastructure overhead, it's the wrong choice.

The Trade-Off Matrix

Here's how the three runtimes compare on the dimensions that matter most for production CX.

Dimension	Google Managed	Anthropic MCP Tunnels	OpenAI Agents SDK
Data location	Google cloud	Your infrastructure	Your process
Model choice	Gemini 3.5 Flash	Any Claude model	100+ providers
Custom tools	Coming soon	Full (via MCP)	Full (function calling)
Observability	Outcomes only	Full execution traces	Full execution traces
Setup complexity	One API call	MCP server + tunnel	SDK dependency
Scaling	Google handles	You handle	You handle
Regulated industries	Check compliance	Typically yes	Typically yes
Cost model	Per-interaction	Per-token	Per-token

The observability row catches most teams off guard. The runtime that abstracts the most (Google) gives you the least insight. The runtimes that require more setup give you full access to the execution loop.

What "Managed" Means for CX Agent Monitoring

Managed runtimes don't ship with quality monitoring. Whatever runtime you use, you build that layer yourself, and the runtime you pick determines how much execution data you have to work with.

A CX agent handling a few thousand calls a day needs per-call transcripts with tool call sequences, quality scores against your CX rubric, alerts when quality drops, and the ability to replay failed conversations to find root cause. Those aren't features of any of the three runtimes. They're things you build on top of whichever runtime you pick.

With Google Managed Agents, the gap is largest. You get task completion status and final output. Internal execution traces aren't exposed at launch. If your agent starts handing out wrong refund amounts on 3-5% of queries, you need an external layer to catch that. The pattern that works: wrap the API call, capture inputs and outputs, and score quality from the transcript.

monitor-google-managed-agent.ts·typescript

import { GoogleGenAI } from "@google/genai";
import { Chanl } from "@chanl/sdk";
 
const genai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const chanl = new Chanl({ apiKey: process.env.CHANL_API_KEY });
 
export async function runCXAgent(callId: string, userMessage: string) {
  const startMs = Date.now();
 
  const response = await genai.agentic.interact({
    model: "gemini-3.5-flash",
    system: "You are a customer support agent for Acme Corp...",
    contents: [{ role: "user", parts: [{ text: userMessage }] }]
  });
 
  const durationMs = Date.now() - startMs;
 
  // The Google call already happened. Now log the input/output to a Call
  // record in your own system and score it against your CX rubric.
  // chanl.scorecards.evaluate runs your scorecard over the stored call.
  const evaluation = await chanl.scorecards.evaluate(callId, {
    scorecardId: "customer-support-v2"
  });
 
  console.log({ durationMs, score: evaluation.overall });
  return response.finalOutput;
}

This pattern works with any runtime. The point is: "managed" doesn't mean "monitored." You build the quality loop separately regardless of which runtime handles execution.

For teams using Anthropic's self-hosted approach or the OpenAI SDK, conversation analytics plugs directly into the execution traces you already have, no wrapping layer needed. You get per-call monitoring and alerts at the level the transcript naturally supports.

Choosing the Right Runtime for Your CX Stack

Use Google Managed Agents if you're building a prototype and want results fast, you don't need custom tools yet, your data doesn't have regulatory restrictions, and you're fine with Gemini 3.5 Flash quality for your use case.

Use Anthropic's MCP tunnels and self-hosted sandboxes if your data must stay in your infrastructure, you need full execution traces for quality analysis and compliance, you're already building with MCP and have an MCP server, or you need Claude's specific reasoning quality for complex CX scenarios.

Use the OpenAI Agents SDK in-process if you want to evaluate multiple models without switching SDKs, you need provider portability, you're already running application servers, or you need the most control over the execution loop and tool dispatch.

For most CX teams, the practical path is this. Google Managed for a working demo in hours, then migrate to Anthropic or the OpenAI SDK when you're ready to instrument properly and move to production. The prototype teaches you what your agent needs. The migration is the point where you build for observability.

Before you commit to any runtime, ask yourself one question: "Will I be able to know, a month from now, why my agent made this specific decision on this specific call?" If the answer is no, you're building blind.

The Bigger Picture

The "managed agents" moment of May 2026 matters because all three major labs shipped runtime abstractions in the same stretch of weeks. That signals that agent infrastructure is moving from "interesting experiment" to "production category."

But the naming convergence masks a real architectural split. Google is betting on fully hosted convenience. Anthropic is betting on data sovereignty with managed connectivity. OpenAI is betting on in-process flexibility with provider agnosticism. These aren't versions of the same product. They're different answers to who should own agent execution.

For CX teams, the right answer almost always comes down to observability and data control. You can read more about what production agent observability looks like in our guide to monitoring AI agents, and if you're evaluating which orchestration pattern fits your stack, multi-agent orchestration patterns for 2026 covers the next layer of complexity.

Production agents need to be monitored, scored, and improved over time. The runtime that gives you the clearest path to doing that is the one worth the setup cost.

Monitor Any Managed Agent Runtime

Connect Google Managed Agents, Anthropic MCP tunnels, or the OpenAI SDK to Chanl for scorecards, conversation analytics, and quality alerts on whichever runtime you're on.

Try Chanl Free

Key Takeaway

Testing edge cases before production deployment can reduce customer complaints by 80% and prevent costly emergency fixes post-launch.

managed-agents agent-runtime google-gemini mcp observability production

Dean Grover

Co-founder

Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.

Learn Agentic AI

Weekly. Patterns for shipping agents that work — MCP, scorecards, regression tests, prompts, model comparisons.

500+ builders subscribed