The customer spent seven minutes with your AI agent. They explained their billing issue twice. The agent verified their identity, looked up their account, identified the discrepancy, and then escalated.
The human agent who picked up the call started with: "Can I get your account number?"
That's not a technology failure. The AI worked exactly as designed. It collected the information, identified the limit of its authority, and escalated appropriately. The failure is in how the handoff was built.
Most escalation designs stop at the decision: should the agent escalate, and when? The harder part is what happens in the 10 seconds between the AI ending and the human beginning. Most teams treat it as someone else's problem. It isn't. It's the moment that determines whether your customer tells their coworkers your AI actually helped them, or whether it wasted their time before eventually connecting them to a human anyway.
The Cost of a Bad Handoff
A bad handoff produces two measurable costs: re-contact (the customer calls back within 24 hours because nothing was resolved) and CSAT contamination (poor handoff experience gets attributed to your entire AI investment). Both are quiet failures, the kind that show up in dashboards as "AI underperforming" when the AI was actually fine.
The most visible cost is re-contact. When customers hang up after an escalation feeling like nothing was resolved, or feeling frustrated by having to repeat information, a significant portion will call back within 24 hours. Each re-contact is a full cost of another support interaction, on top of the first.
The less visible cost is CSAT contamination. A customer who had a good AI interaction but a bad handoff will rate the overall experience based on the handoff. That CSAT score gets attributed to "AI customer service" in your reporting, even though the AI's work was fine. You end up with a misleadingly bad view of your agent's performance on cases it handled well, because the data doesn't separate the escalation experience from the pre-escalation experience.
The easiest way to see this: track CSAT for escalated conversations separately from non-escalated ones. Then further split escalated conversations by handoff quality. Did the human have context before the call? That breakdown usually reveals the CSAT gap between AI and human isn't about AI quality. It's about handoff design.
What Goes in the Context Package?
A handoff context package is a structured object your agent builds at the moment of escalation and delivers to the human agent's interface before the conversation connects. At minimum it carries identity, intent, conversation summary, tool call history, sentiment trend, escalation reason, and a suggested first action. Think of it as a pre-call brief that the human reads before saying hello.
Here's the schema that works across most CX use cases:
interface HandoffContext {
// Who is this
customer: {
id: string;
name: string;
verificationStatus: "verified" | "partial" | "unverified";
channel: "voice" | "chat" | "email" | "messaging";
contactInfo: string;
};
// What they wanted
intent: {
primary: string; // "refund for order #8821"
secondary?: string[]; // ["also asked about account upgrade"]
resolvedItems: string[]; // items the AI already completed
unresolvedItems: string[]; // items that require human action
};
// What happened in the conversation
summary: string; // 3-5 sentence plain-language summary
transcript: ConversationTurn[]; // full turns for reference
sentimentArc: SentimentPoint[]; // sentiment over time
// What the AI did
toolsUsed: ToolCallRecord[]; // every tool call with results
attemptedResolutions: string[]; // what the AI tried before escalating
// Why escalating
escalationReason: {
trigger: "authority_limit" | "customer_request" | "complexity" | "policy" | "repeated_failure";
description: string; // human-readable explanation
urgency: "standard" | "elevated" | "critical";
};
// What the human should do first
suggestedNextAction: string;
// Metadata
conversationId: string;
startedAt: Date;
escalatedAt: Date;
agentVersion: string;
}Most teams implement a subset of this and call it a day. The fields that get skipped most often, and that matter most, are sentimentArc, toolsUsed, and attemptedResolutions.
Why sentimentArc matters: A customer who started frustrated and got more frustrated needs a very different opening than a customer who was neutral until the AI hit a wall. The human agent should know "this person has been trying to resolve this since Tuesday and is at the end of their patience" before they say hello. Sentiment trend over the conversation tells that story in a way that a single at-handoff score doesn't.
Why toolsUsed matters: If the AI called get_account_balance and got a stale result, and that stale result is why the customer thinks there's a discrepancy, the human agent needs to know that. If the AI already verified the customer via SMS OTP, the human doesn't need to re-verify. Tool call history is the evidence trail of what the agent actually saw and did.
Why attemptedResolutions matters: If the AI tried to issue a $50 account credit and the customer said that wasn't enough, the human agent needs to know the customer already rejected that offer. Walking in with the same offer will fail in the first 30 seconds and damage trust.
Building the Context Package
Build the package progressively throughout the conversation, not in a panic at the moment of escalation. Identity gets added when the customer is verified. Intent gets added when it's detected. Tool results land in the package the moment they're returned. The only thing you wait on is the LLM-generated summary, which only makes sense once the full conversation is in.
class HandoffContextBuilder {
private context: Partial<HandoffContext> = {};
private turns: ConversationTurn[] = [];
onCustomerVerified(customer: Customer) {
this.context.customer = {
id: customer.id,
name: customer.displayName,
verificationStatus: "verified",
channel: this.context.customer?.channel ?? "chat",
contactInfo: customer.email,
};
}
onIntentDetected(intent: string) {
if (!this.context.intent) {
this.context.intent = {
primary: intent,
resolvedItems: [],
unresolvedItems: [intent],
};
}
}
onToolCall(tool: string, args: Record<string, unknown>, result: unknown) {
if (!this.context.toolsUsed) this.context.toolsUsed = [];
this.context.toolsUsed.push({
tool,
args,
result,
calledAt: new Date(),
});
}
onTurn(role: "user" | "agent", content: string, sentiment?: number) {
const turn = { role, content, timestamp: new Date(), sentiment };
this.turns.push(turn);
if (!this.context.transcript) this.context.transcript = [];
this.context.transcript.push(turn);
// Update sentiment arc
if (sentiment !== undefined) {
if (!this.context.sentimentArc) this.context.sentimentArc = [];
this.context.sentimentArc.push({ at: turn.timestamp, score: sentiment });
}
}
async buildForEscalation(
reason: HandoffContext["escalationReason"],
llm: LLMClient
): Promise<HandoffContext> {
// Generate summary using LLM at escalation time
const summary = await llm.complete({
messages: [
{
role: "system",
content: "Summarize this customer conversation in 3-5 sentences for a human support agent. Include: what the customer wanted, what was tried, why escalating, and the customer's current state. Be factual and specific.",
},
{
role: "user",
content: this.turns
.map((t) => `${t.role}: ${t.content}`)
.join("\n"),
},
],
});
return {
...this.context,
summary: summary.content,
escalationReason: reason,
suggestedNextAction: this.deriveSuggestedAction(reason),
conversationId: generateId(),
startedAt: this.turns[0]?.timestamp ?? new Date(),
escalatedAt: new Date(),
agentVersion: process.env.AGENT_VERSION ?? "unknown",
} as HandoffContext;
}
private deriveSuggestedAction(reason: HandoffContext["escalationReason"]): string {
const actions: Record<string, string> = {
authority_limit: "Check approval authority for the requested action and proceed if within your limit.",
customer_request: "Acknowledge that the customer requested human assistance and ask how you can help.",
complexity: "Review the tool call history for any partial work before attempting resolution.",
policy: "Review the exception request and escalate to supervisor if policy limits apply.",
repeated_failure: "Do not attempt the same resolution the AI tried. Start with the customer's underlying goal.",
};
return actions[reason.trigger] ?? "Review conversation history before responding.";
}
}The builder accumulates data during the conversation, then generates the LLM summary only at escalation time when you have the full picture. The summary call adds latency (typically 500-800ms), but it runs in parallel with the queue join, not during the customer-facing conversation. By the time the human's phone rings, the package is sitting in their screen-pop.
How Do You Deliver Context to the Human Agent?
Context that isn't delivered to the human agent before the call connects is context that doesn't exist. Assembling the package is the easy part. The integration with your contact center platform is where most implementations fall apart, usually because the screen-pop and the queue join run on different events and one of them gets dropped.
The delivery mechanism depends on your contact center infrastructure:
For CCaaS platforms (Genesys, Five9, NICE, Talkdesk): Most support screen-pop APIs that accept a JSON payload when routing a contact. You send the context package when the customer joins the queue, not when the agent picks up. The agent sees it populate their screen before they click accept.
async function escalateWithContext(
context: HandoffContext,
ccaas: ContactCenterClient
) {
// Serialize the display-ready version
const screenPop = {
title: `Escalation: ${context.intent.primary}`,
customerName: context.customer.name,
verificationStatus: context.customer.verificationStatus,
summary: context.summary,
urgency: context.escalationReason.urgency,
suggestedAction: context.suggestedNextAction,
previousTools: context.toolsUsed?.map((t) => `${t.tool}: ${summarizeResult(t.result)}`),
alertBanner:
context.sentimentArc && isDecliningSentiment(context.sentimentArc)
? "Customer sentiment trending negative. Acknowledge frustration early."
: undefined,
conversationLink: `https://app.chanl.ai/conversations/${context.conversationId}`,
};
await ccaas.createContactWithScreenPop({
customerId: context.customer.id,
channel: context.customer.channel,
queue: selectQueue(context),
screenPopData: screenPop,
priority: context.escalationReason.urgency === "critical" ? 1 : 5,
});
}For voice channels: Add an agent whisper, a brief audio message audible only to the human agent before they take the call. Keep it to 10-15 seconds: "Incoming from Sarah Chen, account holder, billing dispute on order 8821, already verified, AI wasn't able to issue the refund due to authority limits, sentiment is frustrated."
For chat and messaging: Most platforms support agent-readable system messages. Drop the summary and suggested action into the thread before connecting the human. Agents read faster than they listen, so the full summary format works better here than in voice.
What Should the AI Say to the Customer at Escalation?
Be honest, be specific, and set a time expectation in the same breath. The customer knows they're being transferred, and how you frame it sets their expectation for the next few minutes. A good handoff disclosure explains why the transfer is happening, promises the human will have full context, and gives a realistic wait estimate.
Don't say: "Let me transfer you to an agent."
That tells them nothing and prepares them for nothing. They'll assume they have to repeat everything.
Say something like: "I want to connect you with someone who can process this directly. Approving refunds above $50 requires a specialist on our team. I'll send them our full conversation so you don't have to repeat anything. The wait is typically 2-3 minutes. Is that okay?"
Three things this accomplishes: it explains why (authority limit, not AI limitation), it promises context portability (they won't repeat themselves), and it sets a time expectation. Customers tolerate wait times much better when they know roughly how long.
If the queue wait is long, offer alternatives before forcing the customer in:
async function offerEscalationOptions(
context: HandoffContext,
queueWait: number
): Promise<string> {
if (queueWait < 120) {
// Under 2 minutes: just escalate
return "I'm connecting you now.";
}
if (queueWait < 600) {
// 2-10 minutes: offer choice
return `The current wait is about ${Math.ceil(queueWait / 60)} minutes. Would you prefer to wait, or would a callback when an agent is available work better for you? I'll send them everything we discussed either way.`;
}
// Over 10 minutes: proactively suggest alternatives
return `Our wait time is currently ${Math.ceil(queueWait / 60)} minutes. I can schedule a callback for you. Our team will call you back within the hour with your case already loaded. Would that work?`;
}Measuring Handoff Quality
Four signals tell you whether your handoff design is working before customers start complaining about it: re-contact rate within 24 hours, repeat-yourself rate, escalated CSAT versus non-escalated CSAT, and time-to-first-meaningful-response after transfer. Each one points at a different failure mode, and tracking all four separately from your overall AI metrics is what surfaces handoff problems your headline numbers hide.
Once you've instrumented these, the data usually reveals that handoff quality, not AI capability, is driving most of your escalation CSAT gap:
Re-contact rate within 24 hours. Tag escalated conversations and check if the same customer contacts support again within 24 hours. A high re-contact rate (above 20-25%) is a strong signal that escalations aren't actually resolving the issue, often because the human agent started blind and the customer gave up.
Repeat-yourself rate. If your contact center platform captures notes from human agents, you can check whether the agent recorded information the AI already collected (account number, issue description, etc.). A simpler proxy: average handle time after escalation, since agents who have to re-establish context spend more time doing it.
Escalated CSAT vs. non-escalated CSAT. Track these separately. Your non-escalated CSAT is your AI agent's performance. Your escalated CSAT is the handoff experience. If escalated CSAT is dramatically lower, that's a handoff design problem, not an AI capability problem.
Time-to-first-meaningful-response after transfer. In voice, this is how long after the human picks up before they say something beyond "hold please" or "one moment." In chat, it's how long before the first substantive response. Agents who start informed typically reach their first meaningful response in under 30 seconds. Agents starting cold often spend 90 seconds reading back through history before they can respond.
If you're running your CX agents on Chanl, you can track these metrics by tagging escalated conversations in analytics and building a separate view filtered to escalation events. Pair this with scorecard evaluation on escalated calls specifically, scoring handoff quality as its own dimension, separate from the overall conversation score. The conversation traces show every tool call the agent ran, so you can verify the human received the right context and whether they actually followed the recommended first action. For the decision side of the same problem (when to escalate at all), see Smarter Escalation: When Should Voice AI Refuse to Answer? and Handoff Is the New Prompt.
The Invisible Handoff
The goal is a handoff the customer doesn't notice.
Not invisible because the agent pretends no transfer happened. The customer knows. Invisible because the human picks up and demonstrates they already know the situation. "Hi Sarah, I can see you've been dealing with the billing discrepancy on order 8821. I have your account pulled up and I can approve this refund directly."
Remember the customer at the top of this article, the one who spent seven minutes explaining their issue twice and then heard "can I get your account number?" The opening above takes five seconds and signals the opposite: your time was not wasted, the seven minutes contributed to this resolution, you don't have to start over.
That perception, that the AI and human are working together rather than operating in silos, is what turns a mediocre escalation experience into one customers describe as good.
The AI doesn't need to solve every problem. It needs to hand off well enough that the human can solve it fast.
See every escalation in your agent's conversation trace
Chanl captures tool calls, sentiment, and handoff context in a searchable conversation record, so you can see exactly what your human agents receive when your AI escalates.
Explore monitoringCo-founder
Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.
Learn Agentic AI
Weekly. Patterns and recipes for shipping AI agents that actually work — MCP, scorecards, regression tests, prompts, model comparisons. From teams running agents in production.



