You shipped your MCP server. Agents are calling it. Everything looks fine.
Then at 2 AM, a support ticket lands: a customer's refund lookup returned the wrong account balance. You pull the logs. The MCP server returned HTTP 200. The agent's conversation shows it used the result confidently. But somewhere in the chain between the agent's request and the tool's response, something went wrong -- and you have no way to see what.
This is the observability gap that bites teams after their first MCP deployment. The server is running. The API is responding. But you're flying blind on what your agents are actually getting and doing with it.
This guide walks through instrumenting an MCP server for production from scratch: tracing tool calls end to end, detecting loops before they run up your bill, and building the dashboards that actually tell you what's happening.
Why MCP Needs Different Monitoring
Traditional API monitoring answers: did the request succeed, how fast was it, what was the HTTP status? That's enough for a REST API serving human users.
MCP servers don't serve humans. They serve AI agents at runtime -- and agents fail differently. An agent can call the same tool in a loop because its reasoning got stuck. It can receive a perfectly valid HTTP 200 with data that causes it to produce a wrong answer in the next step. It can make subtly malformed tool arguments that pass schema validation but return misleading results.
None of these show up in a standard APM dashboard. You need to see inside the tool calls.
The good news: OpenTelemetry's GenAI semantic conventions (stabilized in early 2026) give you the vocabulary to describe exactly what's happening in an MCP tool call. Once you're emitting the right spans and attributes, any standard observability backend -- Grafana, Datadog, New Relic, Honeycomb -- can query them.
Here's what matters at a glance:
| Signal | Standard API | MCP Server |
|---|---|---|
| Latency | P50, P99 per endpoint | P50, P99 per tool + per agent |
| Errors | HTTP 4xx/5xx | HTTP errors + semantic tool failures |
| Volume | Requests/sec | Calls/session (loop detection) |
| Content | Optional | Critical (what did the agent get?) |
| Cost | N/A | Tokens + downstream API costs |
Step 1: Set Up the OpenTelemetry SDK
Start with a minimal instrumentation setup. You'll add to it, but get the plumbing right first.
// mcp-server/instrumentation.ts
import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-http';
import { PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics';
import { Resource } from '@opentelemetry/resources';
import {
SEMRESATTRS_SERVICE_NAME,
SEMRESATTRS_SERVICE_VERSION,
} from '@opentelemetry/semantic-conventions';
const resource = new Resource({
[SEMRESATTRS_SERVICE_NAME]: 'mcp-tools-server',
[SEMRESATTRS_SERVICE_VERSION]: process.env.SERVICE_VERSION ?? '0.0.0',
'deployment.environment': process.env.NODE_ENV ?? 'development',
'mcp.transport': process.env.MCP_TRANSPORT ?? 'sse',
});
const traceExporter = new OTLPTraceExporter({
url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? 'http://localhost:4318/v1/traces',
});
const metricExporter = new OTLPMetricExporter({
url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? 'http://localhost:4318/v1/metrics',
});
export const sdk = new NodeSDK({
resource,
traceExporter,
metricReader: new PeriodicExportingMetricReader({
exporter: metricExporter,
exportIntervalMillis: 30_000,
}),
});
// Call sdk.start() before importing anything else
sdk.start();Initialize this before your MCP server starts:
// mcp-server/index.ts (top of file, before other imports)
import './instrumentation';
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
// ... rest of your serverStep 2: Instrument Tool Calls
The core of MCP observability is tracing individual tool calls. Each call should produce a span with the tool name, input arguments, output, duration, and any error.
// mcp-server/tracing.ts
import { trace, SpanStatusCode, context, propagation } from '@opentelemetry/api';
import { scrubPII } from './scrubbing';
const tracer = trace.getTracer('mcp-tools', '1.0.0');
interface ToolCallContext {
toolName: string;
sessionId: string;
conversationId: string;
callId: string;
agentVersion?: string;
}
export async function traceToolCall<T>(
ctx: ToolCallContext,
args: Record<string, unknown>,
handler: () => Promise<T>
): Promise<T> {
return tracer.startActiveSpan(
`mcp.tool.${ctx.toolName}`,
{
attributes: {
// GenAI semantic conventions
'gen_ai.tool.name': ctx.toolName,
'gen_ai.tool.call.id': ctx.callId,
'gen_ai.system': 'mcp',
// MCP-specific
'mcp.session.id': ctx.sessionId,
'mcp.conversation.id': ctx.conversationId,
'mcp.agent.version': ctx.agentVersion ?? 'unknown',
// Sanitized input for debugging
'mcp.tool.input': JSON.stringify(scrubPII(args)),
}
},
async (span) => {
const startTime = Date.now();
try {
const result = await handler();
const duration = Date.now() - startTime;
span.setAttributes({
'mcp.tool.duration_ms': duration,
'mcp.tool.success': true,
// Log output at debug level only — too verbose for info
'mcp.tool.output_preview': JSON.stringify(scrubPII(result as Record<string, unknown>)).slice(0, 200),
});
span.setStatus({ code: SpanStatusCode.OK });
return result;
} catch (error) {
const duration = Date.now() - startTime;
const err = error as Error;
span.setAttributes({
'mcp.tool.duration_ms': duration,
'mcp.tool.success': false,
'mcp.tool.error_type': err.constructor.name,
'mcp.tool.error_message': err.message,
});
span.setStatus({
code: SpanStatusCode.ERROR,
message: err.message,
});
span.recordException(err);
throw error;
} finally {
span.end();
}
}
);
}Use this wrapper around every tool in your MCP server:
// Example: account lookup tool
server.tool('get_account_balance', AccountBalanceSchema, async (args, callInfo) => {
return traceToolCall(
{
toolName: 'get_account_balance',
sessionId: callInfo.sessionId ?? 'unknown',
conversationId: args._conversationId ?? 'unknown',
callId: callInfo.callId ?? crypto.randomUUID(),
},
args,
() => accountService.getBalance(args.accountId)
);
});Step 3: Track Metrics for Alerting
Spans give you detailed debugging data. Metrics give you the aggregates you need for alerting and dashboards.
// mcp-server/metrics.ts
import { metrics } from '@opentelemetry/api';
const meter = metrics.getMeter('mcp-tools', '1.0.0');
// Counter: total calls per tool
const toolCallCounter = meter.createCounter('mcp.tool.calls', {
description: 'Total MCP tool calls',
unit: '1',
});
// Histogram: call duration distribution
const toolDurationHistogram = meter.createHistogram('mcp.tool.duration', {
description: 'MCP tool call duration in milliseconds',
unit: 'ms',
advice: { explicitBucketBoundaries: [50, 100, 250, 500, 1000, 2500, 5000] },
});
// Gauge: calls per session in sliding window (for loop detection)
const sessionCallGauge = meter.createObservableGauge('mcp.session.call_rate', {
description: 'Tool calls per session in last 5 minutes',
unit: '1',
});
// Counter: errors per tool
const toolErrorCounter = meter.createCounter('mcp.tool.errors', {
description: 'Total MCP tool call errors',
unit: '1',
});
interface ToolMetricLabels {
tool: string;
session: string;
error_type?: string;
}
export function recordToolCall(
labels: ToolMetricLabels,
durationMs: number,
success: boolean
): void {
const dimensions = {
'tool.name': labels.tool,
'mcp.session.id': labels.session,
};
toolCallCounter.add(1, dimensions);
toolDurationHistogram.record(durationMs, dimensions);
if (!success) {
toolErrorCounter.add(1, {
...dimensions,
'error.type': labels.error_type ?? 'unknown',
});
}
}Step 4: Loop Detection
This is the alerting feature you'll wish you had from day one. Agents can get stuck calling the same tool repeatedly -- usually because a tool returned unexpected data and the agent retried instead of escalating.
// mcp-server/loop-detection.ts
interface SessionWindow {
calls: Map<string, number[]>; // toolName -> timestamps
lastAlert: number;
}
const sessionWindows = new Map<string, SessionWindow>();
const WINDOW_MS = 5 * 60 * 1000; // 5 minutes
const LOOP_THRESHOLD = 15; // calls to same tool in window
export function trackCallForLoops(
sessionId: string,
toolName: string,
onLoopDetected: (sessionId: string, toolName: string, callCount: number) => void
): void {
const now = Date.now();
if (!sessionWindows.has(sessionId)) {
sessionWindows.set(sessionId, { calls: new Map(), lastAlert: 0 });
}
const window = sessionWindows.get(sessionId)!;
if (!window.calls.has(toolName)) {
window.calls.set(toolName, []);
}
const timestamps = window.calls.get(toolName)!;
// Add current call and prune old timestamps
timestamps.push(now);
const cutoff = now - WINDOW_MS;
const recentCalls = timestamps.filter(t => t > cutoff);
window.calls.set(toolName, recentCalls);
// Alert if threshold exceeded (throttle to once per 5 minutes per session)
if (recentCalls.length >= LOOP_THRESHOLD && now - window.lastAlert > WINDOW_MS) {
window.lastAlert = now;
onLoopDetected(sessionId, toolName, recentCalls.length);
}
}
// Cleanup stale sessions (run on a timer)
export function cleanupStaleSessions(maxAgeMs = 30 * 60 * 1000): void {
const cutoff = Date.now() - maxAgeMs;
for (const [sessionId, window] of sessionWindows.entries()) {
const allTimestamps = Array.from(window.calls.values()).flat();
const mostRecent = Math.max(...allTimestamps, 0);
if (mostRecent < cutoff) {
sessionWindows.delete(sessionId);
}
}
}Wire this into your tool call handler:
server.tool('get_account_balance', AccountBalanceSchema, async (args, callInfo) => {
const sessionId = callInfo.sessionId ?? 'unknown';
trackCallForLoops(sessionId, 'get_account_balance', async (sid, tool, count) => {
// Send to your alerting system
await alert.send({
severity: 'warning',
message: `Loop detected: session ${sid} called ${tool} ${count}x in 5 minutes`,
metadata: { sessionId: sid, toolName: tool, callCount: count },
});
});
return traceToolCall(/* ... */);
});Step 5: PII Scrubbing
You're logging tool inputs and outputs. Those almost certainly contain PII. Do the scrubbing before spans are exported.
// mcp-server/scrubbing.ts
const PII_PATTERNS: Array<[RegExp, string]> = [
[/\b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b/g, '[CARD_NUMBER]'],
[/\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b/gi, '[EMAIL]'],
[/\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g, '[PHONE]'],
[/\b\d{3}-\d{2}-\d{4}\b/g, '[SSN]'],
];
export function scrubPII(data: unknown): unknown {
if (typeof data === 'string') {
return PII_PATTERNS.reduce((str, [pattern, replacement]) =>
str.replace(pattern, replacement), data
);
}
if (Array.isArray(data)) {
return data.map(scrubPII);
}
if (data !== null && typeof data === 'object') {
return Object.fromEntries(
Object.entries(data as Record<string, unknown>).map(([k, v]) => [k, scrubPII(v)])
);
}
return data;
}This is minimal but covers the most common PII types. Extend with patterns for your specific domain.
How the Traces Connect
Here's how a single agent turn flows through the trace system, from the agent's reasoning step down to your MCP tool and back:
Each box is a span. They all share the same trace ID. When something goes wrong -- say the DB query returns stale data -- you can pull the full trace and see exactly what the agent received, what it inferred, and what it said.
This is what OpenTelemetry tracing for AI agents looks like from the MCP server's perspective. The linked article covers the agent side of the same trace.
Step 6: Connect to Chanl Monitoring
If you're running your MCP server through Chanl's MCP runtime, the telemetry above feeds directly into the monitoring dashboard. You get per-tool latency trends, call volume charts, and loop detection alerts out of the box.
For the cases where you're running your own MCP infrastructure, here's how to export traces to Chanl alongside your primary observability backend:
// instrumentation.ts — multi-backend export using SimpleSpanProcessor
import { NodeSDK } from '@opentelemetry/sdk-node';
import { SimpleSpanProcessor } from '@opentelemetry/sdk-trace-base';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { Resource } from '@opentelemetry/resources';
const primaryExporter = new OTLPTraceExporter({
url: process.env.PRIMARY_OTLP_ENDPOINT ?? 'http://localhost:4318/v1/traces',
});
// Additional export to Chanl for conversation-quality correlation
const chanlExporter = new OTLPTraceExporter({
url: 'https://telemetry.chanl.ai/v1/traces',
headers: { 'x-chanl-api-key': process.env.CHANL_API_KEY ?? '' },
});
export const sdk = new NodeSDK({
resource: new Resource({ 'service.name': 'mcp-tools-server' }),
spanProcessors: [
new SimpleSpanProcessor(primaryExporter),
new SimpleSpanProcessor(chanlExporter),
],
});
sdk.start();This pattern -- using multiple SpanProcessor instances -- lets you keep your existing Grafana or Datadog setup while also feeding Chanl's analytics layer, which correlates tool performance against conversation quality scores.
What Your First Dashboard Should Show
When your traces start flowing, build this dashboard first. It answers the questions you'll actually ask during an incident.
Health overview (top row):
- Tool call success rate by tool name (last 24h)
- P99 latency by tool (last 1h)
- Active sessions with loop alerts
Volume signals (middle row):
- Calls per tool per hour (trend line)
- Top 10 sessions by call volume (detect outliers)
- Error rate by error type
Debugging helpers (bottom row):
- Recent failed traces (link to trace ID)
- Slowest 10 tool calls (last 1h)
- Authentication failures by client
Most of this is one or two queries in Grafana or Honeycomb once your spans have the right attributes. The hard part is deciding what to look at -- the dashboard above covers 80% of what matters.
Three Things That Will Bite You in the First Week
After shipping MCP observability across a few teams, these are the gotchas that consistently show up in the first week of production.
Baggage propagation breaks at async boundaries. OpenTelemetry baggage (where you store conversation ID, user ID, feature flags) relies on Node.js AsyncLocalStorage. If you're using a job queue, a worker pool, or any pattern that passes work across async contexts without explicit propagation, your downstream spans will lose the parent context. The fix: propagate trace context explicitly when crossing async boundaries.
import { context, propagation } from '@opentelemetry/api';
// When pushing to a queue or starting a worker
const carrier: Record<string, string> = {};
propagation.inject(context.active(), carrier);
// Store carrier in the job payload
await queue.push({ ...jobData, _otelContext: carrier });
// When the worker picks up the job:
const parentContext = propagation.extract(context.active(), job._otelContext);
await context.with(parentContext, async () => {
// All spans created here inherit the parent trace
await processJob(job);
});SSE transport drops spans on reconnect. If you're using Server-Sent Events as your MCP transport, the agent reconnects if the connection drops. Each reconnect creates a new session ID. Without handling this, you'll see fragmented traces that look like separate conversations when they're actually one session. Track reconnect events explicitly and store the original session ID in a persistent store so you can stitch them together.
Sampling drops your most important spans. Head-based sampling (deciding at the start of a trace whether to record it) is simple but wrong for AI agents. A conversation that starts normally can become your most important trace if the agent makes a bad tool call in turn 7. Use tail-based sampling: buffer all spans for a trace and make the sampling decision at the end, keeping any trace that contains errors, high latency, or flagged quality scores. Grafana Tempo and Honeycomb both support tail-based sampling. Datadog's adaptive sampling approximates it.
The tools management guide covers how to use Chanl's tool registry to avoid the SSE reconnect problem entirely when you're using Chanl's MCP runtime.
The Difference Between Running and Watching
An MCP server that's running isn't the same as one you can see. When something goes wrong at 2 AM, the difference between "running" and "watching" is how quickly you find the answer.
The setup in this guide takes about a day to instrument and another day to tune thresholds and build your initial dashboard. After that, you have a production system you can actually debug -- which is a much better place to be than parsing raw logs and guessing what your agent saw.
The MCP Explained guide covers building your first MCP server if you're still getting the basics in place. Once you have a server that works, come back here and make it observable.
Built-in observability for your MCP tools
Chanl's MCP runtime instruments your tool calls automatically -- traces, loop detection, and cost attribution with no extra setup. Connect your existing tools and start monitoring in minutes.
Try Chanl FreeCo-founder
Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.
Aprende IA Agéntica
Una lección por semana: técnicas prácticas para construir, probar y lanzar agentes IA. Desde ingeniería de prompts hasta monitoreo en producción. Aprende haciendo.



