What Is a Context Package in an AI Agent Handoff?

A context package is the structured data your AI agent assembles before transferring a customer to a human agent. It includes the verified customer identity, what the customer wanted, what the agent tried, why it escalated, the conversation summary, the sentiment arc, and any data already collected. A well-designed context package means the human never has to ask the customer to repeat themselves.

What Is the Difference Between a Warm Transfer and a Cold Transfer?

A cold transfer drops the customer into a queue with no context. The human agent starts from zero. A warm transfer delivers a structured context package to the human agent before the conversation connects, so the human already knows the situation. In voice, warm transfers typically include a brief agent whisper (audible only to the human agent) summarizing the situation before they take the call.

What Data Should a Handoff Context Package Always Include?

Every handoff package should include verified customer identity, the primary intent (what they originally wanted), a concise issue summary (2-3 sentences), the conversation history or a compressed version, sentiment at handoff, which tools were called and what they returned, why the agent escalated, and a suggested first action for the human agent. Sentiment and tool results are the fields most teams forget to include.

How Does Conversation Summary Compression Work for Handoffs?

For long conversations, sending the full transcript to the human agent creates information overload. A better approach is to have the LLM generate a 3-5 sentence summary at escalation time, structured as what the customer wanted, what was tried, what failed or was outside scope, and what the customer's current emotional state is. The full transcript stays available but the summary is what gets surfaced in the screen-pop.

How Do You Measure Handoff Quality in Production?

Four signals indicate handoff quality: re-contact rate (does the customer call back within 24 hours?), repeat-yourself rate (does the human agent ask for information the AI already collected?), handle time after escalation (longer often means bad context), and CSAT on escalated conversations specifically. Tracking these separately from non-escalated CSAT surfaces handoff problems that overall scores hide.

What Should the AI Agent Say to the Customer When It Escalates?

Be honest, be specific, and set expectations. Don't say 'Let me transfer you.' Say 'I want to connect you with a specialist who can process this refund directly. It's outside what I can do today. I'll send them your full conversation so you don't have to repeat anything. It should take about 2 minutes.' Customers tolerate escalation much better when they know why it's happening and that their time won't be wasted.

How Do You Handle Handoffs When the Human Agent Queue Is Long?

Offer alternatives before dropping customers into a 20-minute queue. If your agent can't complete the task, it can schedule a callback with context pre-loaded, create a ticket with the context package attached so the next available agent starts informed, or offer async resolution (email response within 2 hours). The handoff experience should match what your support operation can actually deliver, not an ideal that creates queue abandonment.

What Causes Repeat-Yourself Failures in AI-to-Human Handoffs?

Three root causes: the context package wasn't delivered to the human agent's interface (technical integration failure), the human agent didn't read it before taking the call (process failure), or the context package was incomplete or wrong (AI agent failure). You need instrumentation at all three points. Track whether the package was delivered, whether it was opened before the call, and whether the human had to ask for information already in the package.

How to Build the Context Package for AI-to-Human Handoffs

The customer spent seven minutes with your AI agent. They explained their billing issue twice. The agent verified their identity, looked up their account, identified the discrepancy, and then escalated.

The human agent who picked up the call started with: "Can I get your account number?"

That's not a technology failure. The AI worked exactly as designed. It collected the information, identified the limit of its authority, and escalated appropriately. The failure is in how the handoff was built.

Most escalation designs stop at the decision: should the agent escalate, and when? The harder part is what happens in the 10 seconds between the AI ending and the human beginning. Most teams treat it as someone else's problem. It isn't. It's the moment that determines whether your customer tells their coworkers your AI actually helped them, or whether it wasted their time before eventually connecting them to a human anyway.

The Cost of a Bad Handoff

A bad handoff produces two measurable costs: re-contact (the customer calls back within 24 hours because nothing was resolved) and CSAT contamination (poor handoff experience gets attributed to your entire AI investment). Both are quiet failures, the kind that show up in dashboards as "AI underperforming" when the AI was actually fine.

The most visible cost is re-contact. When customers hang up after an escalation feeling like nothing was resolved, or feeling frustrated by having to repeat information, a significant portion will call back within 24 hours. Each re-contact is a full cost of another support interaction, on top of the first.

The less visible cost is CSAT contamination. A customer who had a good AI interaction but a bad handoff will rate the overall experience based on the handoff. That CSAT score gets attributed to "AI customer service" in your reporting, even though the AI's work was fine. You end up with a misleadingly bad view of your agent's performance on cases it handled well, because the data doesn't separate the escalation experience from the pre-escalation experience.

The easiest way to see this: track CSAT for escalated conversations separately from non-escalated ones. Then further split escalated conversations by handoff quality. Did the human have context before the call? That breakdown usually reveals the CSAT gap between AI and human isn't about AI quality. It's about handoff design.

What Goes in the Context Package?

A handoff context package is a structured object your agent builds at the moment of escalation and delivers to the human agent's interface before the conversation connects. At minimum it carries identity, intent, conversation summary, tool call history, sentiment trend, escalation reason, and a suggested first action. Think of it as a pre-call brief that the human reads before saying hello.

Here's the schema that works across most CX use cases:

handoff-context.ts·typescript

interface HandoffContext {
  // Who is this
  customer: {
    id: string;
    name: string;
    verificationStatus: "verified" | "partial" | "unverified";
    channel: "voice" | "chat" | "email" | "messaging";
    contactInfo: string;
  };
 
  // What they wanted
  intent: {
    primary: string;          // "refund for order #8821"
    secondary?: string[];     // ["also asked about account upgrade"]
    resolvedItems: string[];  // items the AI already completed
    unresolvedItems: string[]; // items that require human action
  };
 
  // What happened in the conversation
  summary: string;            // 3-5 sentence plain-language summary
  transcript: ConversationTurn[]; // full turns for reference
  sentimentArc: SentimentPoint[]; // sentiment over time
 
  // What the AI did
  toolsUsed: ToolCallRecord[];    // every tool call with results
  attemptedResolutions: string[]; // what the AI tried before escalating
 
  // Why escalating
  escalationReason: {
    trigger: "authority_limit" | "customer_request" | "complexity" | "policy" | "repeated_failure";
    description: string;    // human-readable explanation
    urgency: "standard" | "elevated" | "critical";
  };
 
  // What the human should do first
  suggestedNextAction: string;
 
  // Metadata
  conversationId: string;
  startedAt: Date;
  escalatedAt: Date;
  agentVersion: string;
}

Most teams implement a subset of this and call it a day. The fields that get skipped most often, and that matter most, are sentimentArc, toolsUsed, and attemptedResolutions.

Why sentimentArc matters: A customer who started frustrated and got more frustrated needs a very different opening than a customer who was neutral until the AI hit a wall. The human agent should know "this person has been trying to resolve this since Tuesday and is at the end of their patience" before they say hello. Sentiment trend over the conversation tells that story in a way that a single at-handoff score doesn't.

Why toolsUsed matters: If the AI called get_account_balance and got a stale result, and that stale result is why the customer thinks there's a discrepancy, the human agent needs to know that. If the AI already verified the customer via SMS OTP, the human doesn't need to re-verify. Tool call history is the evidence trail of what the agent actually saw and did.

Why attemptedResolutions matters: If the AI tried to issue a $50 account credit and the customer said that wasn't enough, the human agent needs to know the customer already rejected that offer. Walking in with the same offer will fail in the first 30 seconds and damage trust.

Building the Context Package

Build the package progressively throughout the conversation, not in a panic at the moment of escalation. Identity gets added when the customer is verified. Intent gets added when it's detected. Tool results land in the package the moment they're returned. The only thing you wait on is the LLM-generated summary, which only makes sense once the full conversation is in.

context-builder.ts·typescript

class HandoffContextBuilder {
  private context: Partial<HandoffContext> = {};
  private turns: ConversationTurn[] = [];
 
  onCustomerVerified(customer: Customer) {
    this.context.customer = {
      id: customer.id,
      name: customer.displayName,
      verificationStatus: "verified",
      channel: this.context.customer?.channel ?? "chat",
      contactInfo: customer.email,
    };
  }
 
  onIntentDetected(intent: string) {
    if (!this.context.intent) {
      this.context.intent = {
        primary: intent,
        resolvedItems: [],
        unresolvedItems: [intent],
      };
    }
  }
 
  onToolCall(tool: string, args: Record<string, unknown>, result: unknown) {
    if (!this.context.toolsUsed) this.context.toolsUsed = [];
    this.context.toolsUsed.push({
      tool,
      args,
      result,
      calledAt: new Date(),
    });
  }
 
  onTurn(role: "user" | "agent", content: string, sentiment?: number) {
    const turn = { role, content, timestamp: new Date(), sentiment };
    this.turns.push(turn);
    if (!this.context.transcript) this.context.transcript = [];
    this.context.transcript.push(turn);
 
    // Update sentiment arc
    if (sentiment !== undefined) {
      if (!this.context.sentimentArc) this.context.sentimentArc = [];
      this.context.sentimentArc.push({ at: turn.timestamp, score: sentiment });
    }
  }
 
  async buildForEscalation(
    reason: HandoffContext["escalationReason"],
    llm: LLMClient
  ): Promise<HandoffContext> {
    // Generate summary using LLM at escalation time
    const summary = await llm.complete({
      messages: [
        {
          role: "system",
          content: "Summarize this customer conversation in 3-5 sentences for a human support agent. Include: what the customer wanted, what was tried, why escalating, and the customer's current state. Be factual and specific.",
        },
        {
          role: "user",
          content: this.turns
            .map((t) => `${t.role}: ${t.content}`)
            .join("\n"),
        },
      ],
    });
 
    return {
      ...this.context,
      summary: summary.content,
      escalationReason: reason,
      suggestedNextAction: this.deriveSuggestedAction(reason),
      conversationId: generateId(),
      startedAt: this.turns[0]?.timestamp ?? new Date(),
      escalatedAt: new Date(),
      agentVersion: process.env.AGENT_VERSION ?? "unknown",
    } as HandoffContext;
  }
 
  private deriveSuggestedAction(reason: HandoffContext["escalationReason"]): string {
    const actions: Record<string, string> = {
      authority_limit: "Check approval authority for the requested action and proceed if within your limit.",
      customer_request: "Acknowledge that the customer requested human assistance and ask how you can help.",
      complexity: "Review the tool call history for any partial work before attempting resolution.",
      policy: "Review the exception request and escalate to supervisor if policy limits apply.",
      repeated_failure: "Do not attempt the same resolution the AI tried. Start with the customer's underlying goal.",
    };
    return actions[reason.trigger] ?? "Review conversation history before responding.";
  }
}

The builder accumulates data during the conversation, then generates the LLM summary only at escalation time when you have the full picture. The summary call adds latency (typically 500-800ms), but it runs in parallel with the queue join, not during the customer-facing conversation. By the time the human's phone rings, the package is sitting in their screen-pop.

How Do You Deliver Context to the Human Agent?

Context that isn't delivered to the human agent before the call connects is context that doesn't exist. Assembling the package is the easy part. The integration with your contact center platform is where most implementations fall apart, usually because the screen-pop and the queue join run on different events and one of them gets dropped.

The delivery mechanism depends on your contact center infrastructure:

For CCaaS platforms (Genesys, Five9, NICE, Talkdesk): Most support screen-pop APIs that accept a JSON payload when routing a contact. You send the context package when the customer joins the queue, not when the agent picks up. The agent sees it populate their screen before they click accept.

screen-pop-delivery.ts·typescript

async function escalateWithContext(
  context: HandoffContext,
  ccaas: ContactCenterClient
) {
  // Serialize the display-ready version
  const screenPop = {
    title: `Escalation: ${context.intent.primary}`,
    customerName: context.customer.name,
    verificationStatus: context.customer.verificationStatus,
    summary: context.summary,
    urgency: context.escalationReason.urgency,
    suggestedAction: context.suggestedNextAction,
    previousTools: context.toolsUsed?.map((t) => `${t.tool}: ${summarizeResult(t.result)}`),
    alertBanner:
      context.sentimentArc && isDecliningSentiment(context.sentimentArc)
        ? "Customer sentiment trending negative. Acknowledge frustration early."
        : undefined,
    conversationLink: `https://app.chanl.ai/conversations/${context.conversationId}`,
  };
 
  await ccaas.createContactWithScreenPop({
    customerId: context.customer.id,
    channel: context.customer.channel,
    queue: selectQueue(context),
    screenPopData: screenPop,
    priority: context.escalationReason.urgency === "critical" ? 1 : 5,
  });
}

For voice channels: Add an agent whisper, a brief audio message audible only to the human agent before they take the call. Keep it to 10-15 seconds: "Incoming from Sarah Chen, account holder, billing dispute on order 8821, already verified, AI wasn't able to issue the refund due to authority limits, sentiment is frustrated."

For chat and messaging: Most platforms support agent-readable system messages. Drop the summary and suggested action into the thread before connecting the human. Agents read faster than they listen, so the full summary format works better here than in voice.

What Should the AI Say to the Customer at Escalation?

Be honest, be specific, and set a time expectation in the same breath. The customer knows they're being transferred, and how you frame it sets their expectation for the next few minutes. A good handoff disclosure explains why the transfer is happening, promises the human will have full context, and gives a realistic wait estimate.

Don't say: "Let me transfer you to an agent."

That tells them nothing and prepares them for nothing. They'll assume they have to repeat everything.

Say something like: "I want to connect you with someone who can process this directly. Approving refunds above $50 requires a specialist on our team. I'll send them our full conversation so you don't have to repeat anything. The wait is typically 2-3 minutes. Is that okay?"

Three things this accomplishes: it explains why (authority limit, not AI limitation), it promises context portability (they won't repeat themselves), and it sets a time expectation. Customers tolerate wait times much better when they know roughly how long.

If the queue wait is long, offer alternatives before forcing the customer in:

escalation-options.ts·typescript

async function offerEscalationOptions(
  context: HandoffContext,
  queueWait: number
): Promise<string> {
  if (queueWait < 120) {
    // Under 2 minutes: just escalate
    return "I'm connecting you now.";
  }
 
  if (queueWait < 600) {
    // 2-10 minutes: offer choice
    return `The current wait is about ${Math.ceil(queueWait / 60)} minutes. Would you prefer to wait, or would a callback when an agent is available work better for you? I'll send them everything we discussed either way.`;
  }
 
  // Over 10 minutes: proactively suggest alternatives
  return `Our wait time is currently ${Math.ceil(queueWait / 60)} minutes. I can schedule a callback for you. Our team will call you back within the hour with your case already loaded. Would that work?`;
}

Measuring Handoff Quality

Four signals tell you whether your handoff design is working before customers start complaining about it: re-contact rate within 24 hours, repeat-yourself rate, escalated CSAT versus non-escalated CSAT, and time-to-first-meaningful-response after transfer. Each one points at a different failure mode, and tracking all four separately from your overall AI metrics is what surfaces handoff problems your headline numbers hide.

Once you've instrumented these, the data usually reveals that handoff quality, not AI capability, is driving most of your escalation CSAT gap:

Re-contact rate within 24 hours. Tag escalated conversations and check if the same customer contacts support again within 24 hours. A high re-contact rate (above 20-25%) is a strong signal that escalations aren't actually resolving the issue, often because the human agent started blind and the customer gave up.

Repeat-yourself rate. If your contact center platform captures notes from human agents, you can check whether the agent recorded information the AI already collected (account number, issue description, etc.). A simpler proxy: average handle time after escalation, since agents who have to re-establish context spend more time doing it.

Escalated CSAT vs. non-escalated CSAT. Track these separately. Your non-escalated CSAT is your AI agent's performance. Your escalated CSAT is the handoff experience. If escalated CSAT is dramatically lower, that's a handoff design problem, not an AI capability problem.

Time-to-first-meaningful-response after transfer. In voice, this is how long after the human picks up before they say something beyond "hold please" or "one moment." In chat, it's how long before the first substantive response. Agents who start informed typically reach their first meaningful response in under 30 seconds. Agents starting cold often spend 90 seconds reading back through history before they can respond.

If you're running your CX agents on Chanl, you can track these metrics by tagging escalated conversations in analytics and building a separate view filtered to escalation events. Pair this with scorecard evaluation on escalated calls specifically, scoring handoff quality as its own dimension, separate from the overall conversation score. The conversation traces show every tool call the agent ran, so you can verify the human received the right context and whether they actually followed the recommended first action. For the decision side of the same problem (when to escalate at all), see Smarter Escalation: When Should Voice AI Refuse to Answer? and Handoff Is the New Prompt.

The Invisible Handoff

The goal is a handoff the customer doesn't notice.

Not invisible because the agent pretends no transfer happened. The customer knows. Invisible because the human picks up and demonstrates they already know the situation. "Hi Sarah, I can see you've been dealing with the billing discrepancy on order 8821. I have your account pulled up and I can approve this refund directly."

Remember the customer at the top of this article, the one who spent seven minutes explaining their issue twice and then heard "can I get your account number?" The opening above takes five seconds and signals the opposite: your time was not wasted, the seven minutes contributed to this resolution, you don't have to start over.

That perception, that the AI and human are working together rather than operating in silos, is what turns a mediocre escalation experience into one customers describe as good.

The AI doesn't need to solve every problem. It needs to hand off well enough that the human can solve it fast.

See every escalation in your agent's conversation trace

Chanl captures tool calls, sentiment, and handoff context in a searchable conversation record, so you can see exactly what your human agents receive when your AI escalates.

Explore monitoring

Key Takeaway

Testing edge cases before production deployment can reduce customer complaints by 80% and prevent costly emergency fixes post-launch.

escalation handoff cx-agents customer-experience agent-architecture context

Dean Grover

Co-founder

Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.