What Is an Interrupt Pattern in an AI Agent?

An interrupt pattern pauses agent execution before a specific action and waits for human input before continuing. Unlike a static approval rule, an interrupt captures the full agent state at the pause point so the workflow can resume exactly where it left off after approval, rejection, or modification.

How Is an Agent Interrupt Different From a Human Handoff?

A handoff transfers ownership to a human and typically ends the agent's involvement. An interrupt pauses the agent temporarily. A human reviews and approves one specific action, then the agent resumes autonomously. The agent stays in control of the workflow; the human gates one decision.

What State Needs to Be Persisted for a Reliable Interrupt?

You need to persist the full conversation history, the current tool call and its arguments, all prior tool results, the agent's system prompt, the agent version, and enough metadata to notify and resume. Without the full context, the resumed agent re-derives its plan from scratch and may make different decisions.

How Do I Prevent an Agent From Timing Out While Waiting for Approval?

Use an async execution model so the waiting state doesn't consume a live connection. Persist state to durable storage (PostgreSQL, Redis) at the interrupt point, then resume via a callback webhook or polling. Set an SLA timer that escalates or auto-rejects if no approval arrives within your defined window.

What Does the EU AI Act Require for Human Oversight of AI Agents?

Article 14 of the EU AI Act requires high-risk AI systems to include human-machine interface tools that let natural persons oversee, intervene, and override AI outputs. For CX agents touching financial decisions or regulated services in EU markets, documented approval workflows with audit trails are required by the August 2026 deadline.

Can I Test Interrupt Flows Before Deploying to Production?

Yes. You can run simulated scenarios that deliberately trigger each interrupt condition, verify the agent pauses correctly, validate the state snapshot is complete, and confirm the resume path reconstructs the right context. Test the rejection path as carefully as the approval path.

How Do I Decide Which Actions Should Trigger an Interrupt?

Apply a risk matrix: irreversible actions always get interrupted. Actions above a value threshold get interrupted based on that threshold. Ambiguous intent triggers an interrupt. Regulated actions in your jurisdiction require documented human decision. Everything else runs autonomously. Start conservative and widen the autonomous window as confidence builds.

What Happens if an Approver Modifies the Action Before Approving?

The modified arguments replace the original tool call parameters before execution. Your interrupt handler needs to accept a modified payload, not just approve/reject booleans. Store the original parameters alongside the modified ones in the audit trail so post-incident reviews can see what the agent proposed versus what a human approved.

How to Build Agent Interrupt and Approval Checkpoints

Your agent has done everything right. It verified the customer's identity, confirmed the return window, checked the order history, and calculated the refund. Now it's one function call away from sending $4,800 to a debit card.

Should it go?

If your answer is "it depends," you need the interrupt pattern.

The interrupt pattern is how production CX agents handle the gap between "the agent got it right" and "I'm comfortable letting it decide alone." It's not about distrusting your agent. It's about knowing exactly which actions are safe to delegate completely, and which ones need one more set of eyes. Not forever. Just for a beat.

Here's how to build it properly.

What Actions Are Worth Pausing On?

The point of an interrupt isn't to second-guess every decision. It's to identify the narrow category of actions where autonomous execution creates risk you're not ready to accept, then pause only those.

Four categories earn an interrupt:

Irreversible actions are the clearest case. A refund sent, a subscription cancelled, an account deleted. None of these can be undone with a follow-up API call. The cost of pausing once is lower than the cost of a reversal, if one even exists.

High-value actions earn a threshold. You decide the number: $200, $500, $2,000. Below it, the agent acts. Above it, a supervisor sees the proposed action before it fires.

Ambiguous intent catches the edge cases your training data didn't. When a customer's request contradicts their account state, or when sentiment signals conflict with the words, a pause gives a human a chance to interpret before the agent commits.

Regulated actions depend on your jurisdiction. The EU AI Act's August 2026 deadline (Article 14) requires documented human oversight for high-risk AI decisions in financial services, employment, and adjacent areas. If your agents serve EU customers, this category isn't optional.

The decision about when to pause is the strategy question. We covered it in depth in When and Where Should Humans Intervene in AI Workflows. This article is about how to implement the pause once you know where it belongs.

The Four Components of a Working Interrupt

A working interrupt requires four things in sequence: an interrupt gate that intercepts the flagged call, a checkpoint that preserves agent state, a notification that reaches the right person, and a resume path that reconstructs the workflow correctly.

Drop any one of these and you get a broken pattern. A gate without a checkpoint loses the agent's reasoning. A checkpoint without a notification means the pause never gets human attention. A notification without a resume path leaves the agent stuck indefinitely.

Here's each component in detail.

The Interrupt Gate

The gate is the code that sits in front of specific tool calls. When the agent prepares to execute a flagged action, the gate intercepts it and routes it to a pending state instead of firing immediately.

interrupt-gate.ts·typescript

interface InterruptRule {
  toolName: string;
  condition: (args: Record<string, unknown>) => boolean;
  reason: string;
  approverRole: string;
  timeoutMinutes: number;
}
 
interface InterruptDecision {
  shouldInterrupt: boolean;
  reason?: string;
  approverRole?: string;
  timeoutMinutes?: number;
}
 
class InterruptGate {
  constructor(private rules: InterruptRule[]) {}
 
  evaluate(
    toolName: string,
    args: Record<string, unknown>
  ): InterruptDecision {
    const rule = this.rules.find((r) => r.toolName === toolName);
    if (!rule) return { shouldInterrupt: false };
 
    const triggered = rule.condition(args);
    if (!triggered) return { shouldInterrupt: false };
 
    return {
      shouldInterrupt: true,
      reason: rule.reason,
      approverRole: rule.approverRole,
      timeoutMinutes: rule.timeoutMinutes,
    };
  }
}
 
const gate = new InterruptGate([
  {
    toolName: "issue_refund",
    condition: (args) => (args.amount as number) > 500,
    reason: "Refund exceeds $500 threshold. Supervisor review required.",
    approverRole: "supervisor",
    timeoutMinutes: 30,
  },
  {
    toolName: "cancel_subscription",
    condition: () => true,
    reason: "Subscription cancellation is irreversible",
    approverRole: "retention_lead",
    timeoutMinutes: 15,
  },
  {
    toolName: "send_contract",
    condition: (args) => (args.value as number) > 10000,
    reason: "Contract value exceeds $10k. Legal review required.",
    approverRole: "legal",
    timeoutMinutes: 1440,
  },
]);

The gate is deliberately stateless. It doesn't know about conversation history, customer context, or the agent's reasoning. It only answers one question: given this tool name and these arguments, should we interrupt?

The Checkpoint

This is where most interrupt implementations fail. When you pause the agent, you need to serialize the complete agent state so the resume looks identical to a natural continuation, not a restart.

What goes into the checkpoint:

checkpoint.ts·typescript

interface AgentCheckpoint {
  checkpointId: string;
  conversationId: string;
  createdAt: string;
  expiresAt: string;
 
  // The proposed action
  pendingToolCall: {
    toolName: string;
    args: Record<string, unknown>;
    callId: string;
  };
 
  // Full agent context at the moment of interrupt
  messages: Message[];
  priorToolResults: ToolResult[];
  systemPrompt: string;
  agentVersion: string;
 
  // Interrupt metadata
  interruptReason: string;
  approverRole: string;
  timeoutMinutes: number;
  notificationSentAt?: string;
}

The messages array is the part teams most often shortcut. They store just the pending tool call and plan to reconstruct context from a database on resume. This creates two problems: the reconstructed context may not match the original (customer data can change during the approval window), and the agent restarts its reasoning instead of continuing from the pause point.

Serialize the full conversation state. Yes, it costs more storage. It costs far less than an agent that resumes with a different mental model than the one it had when it paused.

For persistence, use a durable backend:

checkpoint-store.ts·typescript

import { Pool } from "pg";
 
class CheckpointStore {
  constructor(private db: Pool) {}
 
  async save(checkpoint: AgentCheckpoint): Promise<void> {
    await this.db.query(
      `INSERT INTO agent_checkpoints
         (checkpoint_id, conversation_id, payload, status, expires_at)
       VALUES ($1, $2, $3, 'pending', $4)`,
      [
        checkpoint.checkpointId,
        checkpoint.conversationId,
        JSON.stringify(checkpoint),
        checkpoint.expiresAt,
      ]
    );
  }
 
  async get(checkpointId: string): Promise<AgentCheckpoint | null> {
    const result = await this.db.query(
      `SELECT payload FROM agent_checkpoints
       WHERE checkpoint_id = $1
         AND status = 'pending'
         AND expires_at > NOW()`,
      [checkpointId]
    );
    return result.rows[0]?.payload ?? null;
  }
 
  async resolve(
    checkpointId: string,
    decision: "approved" | "rejected",
    modifiedArgs?: Record<string, unknown>
  ): Promise<void> {
    await this.db.query(
      `UPDATE agent_checkpoints
       SET status = $2, resolved_at = NOW(), modified_args = $3
       WHERE checkpoint_id = $1`,
      [
        checkpointId,
        decision,
        modifiedArgs ? JSON.stringify(modifiedArgs) : null,
      ]
    );
  }
}

The Approval Queue

The queue is the interface between the interrupt and the human who needs to act on it. At minimum it needs to show the action the agent wants to take, the arguments it proposes, the conversation context that led here, and three options: approve, modify, reject.

The notification side matters as much as the UI. A pending approval that doesn't reach the right person is dead work:

approval-notifier.ts·typescript

interface ApprovalRequest {
  checkpointId: string;
  approverRole: string;
  toolName: string;
  args: Record<string, unknown>;
  reason: string;
  conversationId: string;
  customerId: string;
  timeoutAt: string;
  reviewUrl: string;
}
 
class ApprovalNotifier {
  async notify(request: ApprovalRequest): Promise<void> {
    switch (request.approverRole) {
      case "supervisor":
        await this.sendSlackAlert(request, "#cx-supervisors");
        break;
      case "legal":
        await this.sendEmailAlert(request, "legal@yourcompany.com");
        break;
      case "retention_lead":
        await this.sendSlackAlert(request, "#retention-team");
        break;
    }
 
    // Audit trail regardless of notification channel
    await this.logToAuditTrail(request);
  }
}

Resume, Modify, Reject

The resume path reconstructs the agent's execution context from the checkpoint and re-injects it at the exact moment of the pause. The agent shouldn't need to re-derive anything it already computed.

agent-runner.ts·typescript

interface ApprovalDecision {
  approved: boolean;
  approverName: string;
  rejectionReason?: string;
  modifiedArgs?: Record<string, unknown>;
}
 
class AgentRunner {
  async resumeFromCheckpoint(
    checkpointId: string,
    decision: ApprovalDecision
  ): Promise<void> {
    const checkpoint = await this.store.get(checkpointId);
    if (!checkpoint) throw new Error("Checkpoint not found or expired");
 
    await this.store.resolve(
      checkpointId,
      decision.approved ? "approved" : "rejected",
      decision.modifiedArgs
    );
 
    if (!decision.approved) {
      await this.continueWithResult(checkpoint, {
        toolCallId: checkpoint.pendingToolCall.callId,
        content: `Action rejected by ${decision.approverName}: ${decision.rejectionReason}. Please inform the customer and offer alternatives.`,
      });
      return;
    }
 
    const finalArgs =
      decision.modifiedArgs ?? checkpoint.pendingToolCall.args;
 
    const result = await this.tools.call(
      checkpoint.pendingToolCall.toolName,
      finalArgs
    );
 
    await this.continueWithResult(checkpoint, {
      toolCallId: checkpoint.pendingToolCall.callId,
      content: JSON.stringify(result),
    });
  }
 
  private async continueWithResult(
    checkpoint: AgentCheckpoint,
    toolResult: { toolCallId: string; content: string }
  ): Promise<void> {
    const messages = [
      ...checkpoint.messages,
      {
        role: "tool" as const,
        content: toolResult.content,
        tool_call_id: toolResult.toolCallId,
      },
    ];
 
    await this.run({
      messages,
      systemPrompt: checkpoint.systemPrompt,
      priorToolResults: [
        ...checkpoint.priorToolResults,
        toolResult,
      ],
    });
  }
}

Here's the complete flow from tool call to resume:

Agent interrupt flow from tool call to human approval and resume

Wiring It Into Your Tool Executor

Wire the gate, checkpoint, and notifier into your tool executor so every outbound tool call passes through the interrupt check automatically. The executor evaluates the gate, saves state if triggered, fires the notification, then throws an exception that your outer loop catches to pause the agent cleanly.

Here's how those pieces connect in practice:

agent-executor.ts·typescript

class InterruptException extends Error {
  constructor(public checkpointId: string) {
    super(`Interrupt: awaiting approval for checkpoint ${checkpointId}`);
  }
}
 
class AgentExecutor {
  async executeToolCall(
    toolName: string,
    args: Record<string, unknown>,
    context: AgentContext
  ): Promise<ToolResult> {
    const decision = this.gate.evaluate(toolName, args);
 
    if (!decision.shouldInterrupt) {
      return await this.tools.call(toolName, args);
    }
 
    const checkpoint: AgentCheckpoint = {
      checkpointId: crypto.randomUUID(),
      conversationId: context.conversationId,
      createdAt: new Date().toISOString(),
      expiresAt: new Date(
        Date.now() + decision.timeoutMinutes! * 60 * 1000
      ).toISOString(),
      pendingToolCall: {
        toolName,
        args,
        callId: context.currentToolCallId,
      },
      messages: context.messages,
      priorToolResults: context.priorToolResults,
      systemPrompt: context.systemPrompt,
      agentVersion: context.agentVersion,
      interruptReason: decision.reason!,
      approverRole: decision.approverRole!,
      timeoutMinutes: decision.timeoutMinutes!,
    };
 
    await this.store.save(checkpoint);
 
    await this.notifier.notify({
      checkpointId: checkpoint.checkpointId,
      approverRole: checkpoint.approverRole,
      toolName,
      args,
      reason: checkpoint.interruptReason,
      conversationId: context.conversationId,
      customerId: context.customerId,
      timeoutAt: checkpoint.expiresAt,
      reviewUrl: `${this.baseUrl}/approvals/${checkpoint.checkpointId}`,
    });
 
    throw new InterruptException(checkpoint.checkpointId);
  }
}

The InterruptException is caught by your outer execution loop to pause cleanly and return an appropriate message to the customer. Something like "I've escalated this for a quick review. You'll have confirmation within 15 minutes."

Testing Your Interrupt Flows Before Launch

The biggest risk with interrupt patterns is the resume path breaking in production. You've likely tested the happy path: agent pauses, human approves, action fires. You also need to test three other paths: rejection, modification, and timeout expiry.

Chanl Scenarios lets you create test cases that deliberately trigger each interrupt condition, then verify the checkpoint is complete and the resume reconstructs correctly:

interrupt-test.ts·typescript

const largeRefundScenario = {
  name: "Refund over $500 triggers supervisor interrupt",
  setup: {
    customer: { id: "test-001", orderHistory: [{ id: "ORD-999", amount: 850 }] },
    agentConfig: { tools: ["issue_refund", "check_order"] },
  },
  script: [{ role: "user", content: "I need a full refund for order ORD-999" }],
  assertions: [
    { type: "tool_call_intercepted", toolName: "issue_refund" },
    { type: "checkpoint_created", checkpointContains: ["messages", "pendingToolCall"] },
    { type: "notification_sent", toRole: "supervisor" },
    { type: "customer_informed", messageContains: "escalated" },
  ],
};
 
const rejectionScenario = {
  name: "Agent handles supervisor rejection gracefully",
  ...largeRefundScenario,
  approvalDecision: {
    approved: false,
    approverName: "Sarah Chen",
    rejectionReason: "Order outside return window",
  },
  assertions: [
    { type: "agent_resumed" },
    { type: "customer_offered_alternative" },
    { type: "no_refund_issued" },
  ],
};

Test the rejection path explicitly. When a supervisor rejects an $850 refund, does the agent offer a partial refund instead of silently ending the conversation? This is the path most teams forget to verify until a customer calls back angry.

Monitoring Interrupt Health in Production

Once interrupt flows are live, three metrics tell you whether they're working:

Interrupt rate by tool shows what percentage of calls to each flagged tool actually trigger an interrupt. A sudden spike means something upstream changed. A sudden drop might mean your gate condition broke silently.

Time to resolution tracks how long between interrupt creation and human decision. If this creeps up, approvers are becoming a bottleneck. Consider automating lower-risk approvals or redistributing the queue.

Resume success rate shows what percentage of approved interrupts result in a successful tool execution. Failures here point to state corruption in the checkpoint or a tool that changed between interrupt and resume.

Whatever you instrument with, alert when interrupt queues grow faster than they resolve. That's the leading indicator of a bottleneck before customers start feeling the delay.

A simple roll-up over your checkpoint table gets you most of the way there:

interrupt-health.sql·sql

SELECT
  payload->'pendingToolCall'->>'toolName' AS tool_name,
  COUNT(*) AS total,
  AVG(EXTRACT(EPOCH FROM (resolved_at - created_at)) / 60)
    FILTER (WHERE status IN ('approved', 'rejected')) AS median_minutes,
  COUNT(*) FILTER (WHERE status = 'approved')::float
    / NULLIF(COUNT(*) FILTER (WHERE status IN ('approved','rejected')), 0)
    AS approval_rate
FROM agent_checkpoints
WHERE created_at > NOW() - INTERVAL '7 days'
GROUP BY tool_name;

Target a resume success rate above 0.95 and median resolution within your customer-facing SLA. Watch for drift in either direction. A sudden drop in interrupts on a tool you flagged usually means a gate condition broke silently, not that risk went away.

The Audit Trail

Every interrupt, decision, and resume needs a tamper-evident audit record. This is what your post-incident reviews reach for, and what regulators ask for:

audit-log.ts·typescript

type InterruptEventType =
  | "interrupted"
  | "notified"
  | "approved"
  | "rejected"
  | "modified"
  | "resumed"
  | "expired";
 
interface InterruptAuditEvent {
  eventType: InterruptEventType;
  checkpointId: string;
  conversationId: string;
  agentId: string;
  toolName: string;
  proposedArgs: Record<string, unknown>;
  finalArgs?: Record<string, unknown>;
  decidedBy?: string;
  decidedAt?: string;
  reason?: string;
  timestamp: string;
}

Store these in append-only tables. Log every state transition. If an auditor asks "what did your agent propose, who approved it, and what actually executed?" you should be able to answer in under five minutes.

Start Narrow, Expand With Confidence

Start with one interrupt rule. Pick the highest-stakes irreversible action in your agent's toolkit. Instrument it. Watch the metrics for two weeks. Then add the next rule.

Teams that interrupt everything create approval queues that never get fully resolved, which trains approvers to rubber-stamp them, which defeats the purpose entirely. Start with the actions that genuinely need a second look, prove that the review actually catches problems, then widen from there.

The interrupt pattern doesn't limit what your agent can do. It's what makes it safe to give your agent more authority over time. Giving an agent the ability to ask for help, precisely and sparingly, is what earns it the right to act alone.

If you want to go deeper on how to structure the human-agent handoff when an interrupt escalates to a full transfer, see The Context Package: What Your Agent Should Hand Off to a Human.

Monitor what your agents pause on in production

Chanl tracks interrupt rates, approval times, and resume success across all your agents. See which checkpoints are working and which are becoming bottlenecks.

See Chanl Monitoring

Key Takeaway

Testing edge cases before production deployment can reduce customer complaints by 80% and prevent costly emergency fixes post-launch.

human-in-the-loop agent-architecture checkpointing approval-workflows agentic-workflows production-agents

Dean Grover

Co-founder

Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.

The Signal Briefing

Un email por semana. Cómo los equipos líderes de CS, ingresos e IA están convirtiendo conversaciones en decisiones. Benchmarks, playbooks y lo que funciona en producción.