Why do AI agents fail after a successful conversation?

Most AI agents are built synchronously -- they respond within a single turn and have no mechanism for work that needs to happen after the conversation ends. When a booking is confirmed but no follow-up email arrives or a CRM entry never gets created, the agent completed its conversational job but had no task queue to handle downstream work.

What is an async task queue for AI agents?

An async task queue is a persistent store of units of work that need to happen after (or outside of) the main conversation loop. The agent writes task records during the conversation, and a separate worker process picks them up and executes them -- retrying on failure, tracking status, and expiring stale tasks automatically.

How do retry semantics work for agent tasks?

Retry semantics define what happens when a task fails transiently. A well-designed retry policy uses exponential backoff (2s, 4s, 8s...) with a max retry count and a dead-letter destination for tasks that exhaust retries. The key distinction is between retryable failures (network timeout, rate limit) and terminal failures (invalid user ID, payment declined) -- only the first type should retry.

When should an async agent task expire?

Task expiry should reflect business reality, not technical convenience. A booking confirmation email probably expires after 30 minutes -- if it hasn't sent by then, something is broken and the customer's window has likely passed. A follow-up survey might have a 48-hour window. Setting expiry based on customer impact makes stale successes visible rather than silent.

What metrics should I track for async agent tasks?

At minimum: task creation rate by type, task success rate by type, P95 execution latency, retry rate, and dead-letter queue depth. Retry rate spikes often surface tool degradation before error logs catch it. Watching task health alongside conversation quality lets you correlate upstream failures with downstream customer experience.

What's the difference between async task queues and multi-agent orchestration?

Multi-agent orchestration coordinates multiple agents working on the same goal simultaneously. Async task queues handle work that flows from a single agent after it finishes talking to a user. They solve different problems: orchestration is about parallel coordination across agents, task queues are about temporal decoupling between conversation and downstream execution.

How do I make agent tasks idempotent?

Idempotency means running a task twice produces the same result as running it once. Use a deterministic idempotency key (derived from conversationId + taskType + attempt number) when calling external APIs. Check for existing records before inserting. For email sends, track sent status against a unique key so duplicate deliveries don't happen if the worker retries after a partial success.

Why CX Agents Fail Between Conversations

The call ended cleanly. The agent confirmed the appointment, read back the address, and said goodbye in a warm, natural tone. The customer gave it five stars.

Three hours later, they're calling back, frustrated. The confirmation email never arrived. The appointment isn't in the scheduling system. The CRM shows nothing happened.

Your agent didn't fail during the conversation. It failed after it.

This is the async gap -- the space between "conversation complete" and "everything downstream actually ran." It's where a surprising number of production CX agents quietly break, and where even teams with solid conversation quality scores discover that their customers still feel let down.

If you've been debugging failures that only surface hours after a call, or chasing missing CRM entries that the agent swears it created, you've met this problem. Here's what's actually happening, and how to build out of it.

What "Between Conversations" Actually Means

The work that happens between conversations is usually more consequential than the conversation itself. Think about what a completed booking interaction should trigger:

Send a confirmation email with a calendar link, address, and contact info
Write a CRM record with the outcome, key fields, and sentiment
Schedule a reminder 24 hours before the appointment
Route a callback queue item if something needs follow-up
Update an internal ticketing system with the resolution

Each of these is a distinct piece of work. Some can fail independently. All of them need to complete for the customer to feel actually helped, not just acknowledged.

The problem is that most AI agent frameworks are built around the request-response model. A conversation comes in, the agent generates a response, the turn ends. Post-conversation work is typically bolted on at the end of the handler -- a few await calls, hoping nothing times out and the server stays up.

That hope isn't an architecture. And the cost of getting it wrong is invisible: you won't see the failure in your conversation quality metrics, but your customers will feel it in callbacks, frustration, and churn.

Why Synchronous Post-Call Code Breaks

The naive approach runs all downstream work synchronously at hang-up:

the-naive-approach.ts·typescript

async function handleConversationEnd(session: ConversationSession) {
  // Everything runs in sequence, blocking each other
  await sendConfirmationEmail(session.userId, session.bookingDetails);
  await updateCRM(session.userId, session.outcome);
  await scheduleReminder(session.bookingDetails.time);
  await updateTicketingSystem(session.ticketId, session.resolution);
}

When sendConfirmationEmail throws -- Resend rate limit, template rendering error, recipient inbox full -- the three calls below it never run. The CRM update dies silently. The reminder never schedules. And you won't know until a customer calls back.

The common patch is Promise.allSettled:

slightly-better-but-still-wrong.ts·typescript

async function handleConversationEnd(session: ConversationSession) {
  const results = await Promise.allSettled([
    sendConfirmationEmail(session.userId, session.bookingDetails),
    updateCRM(session.userId, session.outcome),
    scheduleReminder(session.bookingDetails.time),
  ]);
 
  const failures = results.filter(r => r.status === 'rejected');
  if (failures.length > 0) {
    console.error('Some post-conversation tasks failed', failures);
  }
}

Better -- at least the CRM update doesn't die because the email failed. But you still have no retry logic. No state persistence. If your server restarts mid-execution (deploys happen, pods get evicted), all these promises die with it. And the console.error will scroll off your logs before anyone sees it.

What you actually need is to separate the act of creating work from the act of doing work.

Building a Task Queue for CX Agents

A task queue holds this separation in place. The agent writes task records during or immediately after the conversation. A separate worker process picks them up and executes them. The agent never waits for execution to complete.

This gives you three things synchronous code can't provide:

Durability -- tasks survive server restarts, because they live in a database
Retry logic -- failed tasks can be retried without re-running the whole conversation
Observability -- you can see exactly which tasks are pending, running, failed, or expired

Here's a minimal task schema:

task-schema.ts·typescript

interface AgentTask {
  id: string;
  conversationId: string;
  type: 'send_email' | 'update_crm' | 'schedule_reminder' | 'update_ticket';
  payload: Record<string, unknown>;
  status: 'pending' | 'running' | 'completed' | 'failed' | 'expired';
  createdAt: Date;
  expiresAt: Date;
  attempts: number;
  maxAttempts: number;
  nextRetryAt?: Date;
  lastError?: string;
  completedAt?: Date;
}

When the conversation ends, the agent writes records rather than executing directly:

conversation-end-handler.ts·typescript

async function handleConversationEnd(session: ConversationSession) {
  const now = new Date();
 
  const tasks: Omit<AgentTask, 'id'>[] = [
    {
      conversationId: session.id,
      type: 'send_email',
      payload: { userId: session.userId, bookingDetails: session.bookingDetails },
      status: 'pending',
      createdAt: now,
      expiresAt: addMinutes(now, 30), // 30-minute window
      attempts: 0,
      maxAttempts: 3,
    },
    {
      conversationId: session.id,
      type: 'update_crm',
      payload: { userId: session.userId, outcome: session.outcome },
      status: 'pending',
      createdAt: now,
      expiresAt: addHours(now, 4), // 4-hour window
      attempts: 0,
      maxAttempts: 5,
    },
    {
      conversationId: session.id,
      type: 'schedule_reminder',
      payload: {
        userId: session.userId,
        appointmentTime: session.bookingDetails.datetime,
        reminderOffset: '-24h',
      },
      status: 'pending',
      createdAt: now,
      // Expires 2 hours before the appointment (pointless after)
      expiresAt: subHours(session.bookingDetails.datetime, 2),
      attempts: 0,
      maxAttempts: 3,
    },
  ];
 
  await db.tasks.createMany({ data: tasks });
  // Conversation handler returns immediately -- no waiting
}

The worker runs on a polling loop or gets triggered by your queue service:

task-worker.ts·typescript

async function processPendingTasks() {
  const tasks = await db.tasks.findMany({
    where: {
      status: 'pending',
      expiresAt: { gt: new Date() }, // Skip expired tasks
      OR: [
        { nextRetryAt: null },
        { nextRetryAt: { lte: new Date() } }, // Backoff elapsed
      ],
    },
    orderBy: { createdAt: 'asc' },
    take: 20,
  });
 
  await Promise.allSettled(tasks.map(executeTask));
}
 
async function executeTask(task: AgentTask) {
  await db.tasks.update({
    where: { id: task.id },
    data: { status: 'running', attempts: { increment: 1 } },
  });
 
  try {
    await dispatchByType(task);
    await db.tasks.update({
      where: { id: task.id },
      data: { status: 'completed', completedAt: new Date() },
    });
  } catch (err) {
    const exhausted = task.attempts + 1 >= task.maxAttempts;
    await db.tasks.update({
      where: { id: task.id },
      data: {
        status: exhausted ? 'failed' : 'pending',
        lastError: String(err),
        nextRetryAt: exhausted ? null : calcNextRetry(task.attempts + 1),
      },
    });
 
    if (exhausted) {
      await moveToDeadLetter(task);
    }
  }
}

This is the core pattern. The conversation handler writes. The worker reads. Neither one waits for the other.

Idempotency: the Hidden Requirement

Two confirmation emails land in your customer's inbox. Two CRM entries get created for the same call. Two reminder messages fire on Thursday morning. That's what happens when a task worker crashes mid-execution and picks the task up again on restart.

With database polling, a task can be picked up more than once.

With database polling, if two workers run simultaneously or if a worker crashes mid-execution after marking a task running, the same task might execute twice. For external calls, that means two CRM entries, two emails to the customer, two reminder schedules.

The fix is idempotency: running a task twice should produce the same result as running it once.

For external API calls, use an idempotency key:

idempotent-dispatch.ts·typescript

async function sendEmail(task: AgentTask) {
  // Deterministic key: same task always generates the same key
  const idempotencyKey = `${task.conversationId}:${task.type}:${task.id}`;
 
  await resend.emails.send({
    to: task.payload.userId as string,
    subject: 'Your appointment is confirmed',
    react: ConfirmationEmail(task.payload as BookingPayload),
    headers: {
      'Idempotency-Key': idempotencyKey,
    },
  });
}
 
async function updateCRM(task: AgentTask) {
  // Upsert instead of insert -- safe to run multiple times
  await crm.contacts.upsert({
    where: { externalId: task.payload.userId as string },
    create: { ...buildCRMRecord(task.payload) },
    update: { ...buildCRMRecord(task.payload) },
  });
}

Many APIs (Resend, Stripe, Twilio) accept an idempotency key header natively. When you retry a request with the same key, the provider returns the cached result rather than executing again. For your own database writes, use upsert semantics rather than plain insert.

Retry Semantics That Actually Work

Not all failures deserve a retry. A network timeout is transient -- try again. An invalid user ID is terminal -- retrying forever wastes resources and hides the real problem. Your dispatch function should signal which type it is:

typed-errors.ts·typescript

class RetryableError extends Error {
  constructor(message: string) {
    super(message);
    this.name = 'RetryableError';
  }
}
 
class TerminalError extends Error {
  constructor(message: string) {
    super(message);
    this.name = 'TerminalError';
  }
}
 
async function sendEmail(task: AgentTask) {
  const result = await resend.emails.send({ ... });
 
  if (result.statusCode === 429) {
    // Rate limited -- back off and retry
    throw new RetryableError('Resend rate limited');
  }
  if (result.statusCode === 422) {
    // Bad recipient address -- don't retry
    throw new TerminalError(`Invalid email for ${task.payload.userId}`);
  }
}

Exponential backoff with jitter prevents the thundering-herd problem when many tasks fail simultaneously:

backoff.ts·typescript

function calcNextRetry(attempts: number): Date {
  const baseMs = 2000;
  const cap = 60_000; // Max 60s between retries
  const exponential = Math.min(baseMs * Math.pow(2, attempts), cap);
  const jitter = Math.random() * 1000;
  return new Date(Date.now() + exponential + jitter);
}
// Attempt 1: ~2s, Attempt 2: ~4s, Attempt 3: ~8s, Attempt 4: ~16s...

Dead-letter queue depth is your most important operational signal. Tasks that exhaust retries shouldn't just sit in status: 'failed' -- they should move to a separate dead-letter table with dedicated alerting. If that queue fills, something is systematically broken (your email provider is down, the CRM is rejecting writes), and you need to know now rather than when a customer calls back.

When Tasks Should Expire

Expiry is the feature teams skip most often, and the one they regret most.

Say a customer books an appointment for Thursday at 2pm. The confirmation email task fails and sits in the retry queue for six hours. When it finally executes, you send them a "Your appointment is confirmed!" email at 11pm -- after they've already received a confused callback from your human team. Stale success is worse than visible failure.

Each task type should have a business-realistic expiry window:

Task Type	Suggested Window	Rationale
Confirmation email	30 min	Customer expects it while the call is fresh
CRM update	4 hours	Business SLA for data freshness
Appointment reminder	Until T-2h before appointment	Pointless if the appointment already passed
Follow-up survey	48 hours	Response rates drop sharply after that
Internal ticket update	24 hours	Ops team needs timely records

When a task expires, don't discard it silently. Move it to dead-letter with reason: 'expired'. A surge in expired confirmation emails means your email provider had an outage during peak call hours -- that's a real operational signal you want to see.

Task lifecycle from conversation end to completion or dead-letter

Monitoring Task Health Alongside Conversations

A task queue you can't observe is just a different kind of black box. The metrics that matter are straightforward:

Task creation rate by type -- if it drops, your conversation handler may have stopped writing tasks
Success rate by type -- different task types have different failure baselines; CRM updates will fail differently from email sends
P95 execution latency -- how long from creation to completion across all task types
Retry rate -- climbing retry rate is an early signal of tool degradation, often visible before error logs catch up
Dead-letter queue depth -- this should hover near zero; if it climbs, something is systematically broken

The important insight is that task health and conversation quality are connected metrics, not separate ones. If your confirmation email failure rate climbs, your callback rate follows 30 minutes later. Watching both together lets you diagnose the upstream cause before it becomes a customer complaint.

Chanl's monitoring dashboard surfaces this correlation directly: task execution metrics alongside conversation scorecards in the same view. You can also run scenario tests that validate the full post-conversation workflow -- not just "did the agent say the right thing" but "did every downstream task actually execute and complete." This matters especially for teams using memory-backed agents, where a failed CRM update means the agent's memory layer serves stale context on the customer's next call.

For teams hitting reliability failures more broadly, the circuit breaker pattern in Circuit Breakers for AI Agents: Stop the 3 AM Meltdown complements the task queue pattern well: circuit breakers prevent your task worker from hammering a degraded external service during an outage, and the task queue gives you the retry budget to recover cleanly once it recovers.

The Idempotency Key + Task Queue Pattern Together

Here's how a complete booking flow looks when you combine these patterns:

booking-flow-complete.ts·typescript

// Called when agent confirms booking during conversation
async function onBookingConfirmed(
  session: ConversationSession,
  booking: BookingDetails,
) {
  const now = new Date();
 
  await db.tasks.createMany({
    data: [
      {
        conversationId: session.id,
        type: 'send_email',
        payload: { template: 'booking_confirmation', to: session.userId, booking },
        status: 'pending',
        createdAt: now,
        expiresAt: addMinutes(now, 30),
        attempts: 0,
        maxAttempts: 3,
      },
      {
        conversationId: session.id,
        type: 'update_crm',
        payload: { userId: session.userId, outcome: 'booked', booking },
        status: 'pending',
        createdAt: now,
        expiresAt: addHours(now, 4),
        attempts: 0,
        maxAttempts: 5,
      },
      {
        conversationId: session.id,
        type: 'schedule_reminder',
        payload: { userId: session.userId, appointmentTime: booking.datetime },
        status: 'pending',
        createdAt: now,
        expiresAt: subHours(booking.datetime, 2),
        attempts: 0,
        maxAttempts: 3,
      },
    ],
  });
 
  // Agent gets "tasks scheduled" back immediately -- no waiting
  return { tasksScheduled: 3 };
}

If you'd rather use a managed queue service instead of database polling, BullMQ (Redis-backed) and Inngest (durable functions, first-class retry semantics) map cleanly to this schema. The pattern is identical; the infrastructure layer changes.

For teams already on MCP, the Tasks primitive introduced in the 2026 MCP specification brings native lifecycle management -- retry semantics, expiry policies, and task status callbacks -- at the protocol level. The database task queue above is functionally equivalent, and the concepts translate directly if you migrate to MCP Tasks later. We covered how the A2A and MCP protocol stack fits together in A2A and MCP: Building the Agent Protocol Stack from Scratch.

What Good Looks Like

A production CX agent that handles async tasks well has a few visible properties that distinguish it from one that doesn't:

Conversations end in under 200ms even when complex downstream processing is queued. The agent writes task records -- never task results.

The task health dashboard is boring. Success rates hover near 100%. Dead-letter depth is zero. When something spikes, the on-call engineer knows within minutes and has task type, error message, affected conversation IDs, and retry history to diagnose from.

Failed tasks show up in incident reports, not customer callbacks. When a CRM integration goes down, you see it in dead-letter depth before the first customer calls wondering why the agent didn't follow up.

Customers don't know tasks exist. From their perspective, the confirmation email just arrived. The CRM entry was there when the account manager pulled it up. The reminder fired at exactly the right time. The work happened; they just didn't see the mechanics.

That's the goal: invisible reliability. Your agent handles the conversation. The task queue handles everything else. Neither one waits for the other.

Progress0/0

Test your full CX workflow, not just conversations

Chanl's scenario runner validates post-conversation tasks alongside agent quality -- so you catch async failures before customers do.

See How It Works

Key Takeaway

Testing edge cases before production deployment can reduce customer complaints by 80% and prevent costly emergency fixes post-launch.

async-agents task-management cx-workflows agent-architecture production-patterns

Dean Grover

Co-founder

Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.

The Signal Briefing

Un email por semana. Cómo los equipos líderes de CS, ingresos e IA están convirtiendo conversaciones en decisiones. Benchmarks, playbooks y lo que funciona en producción.