ChanlChanl
Security & Compliance

Build a KYC Voice Agent: 4-Minute Account Open, 5-Year Audit Log

Voice collects name, DOB, SSN. SMS hands the camera the rest. OFAC screens before the next word. Architecture of a KYC voice agent that survives a BSA exam.

DGDean GroverCo-founderFollow
April 30, 2026
12 min read
An ID Card Propped Against a Notebook on a Quiet Desk at Evening, Two Hands Tilting a Phone Down to Photograph It

The web form kills your customers. Twelve fields, four screens, an ID upload that fails on the first try, and a "verifying your identity" spinner that has no end. Onboarding drop-off climbed from around 40 percent in 2016 to roughly 68 percent in 2022, and 40 to 50 percent of that drop happens specifically inside the KYC step. Once a flow runs longer than three to five minutes, abandonment crosses 50 percent. A better progress bar will not save you.

Voice changes the rhythm. A patient-sounding voice can collect name, date of birth, and an SSN in under 90 seconds. The catch is that voice cannot see. You cannot read a driver's license through a phone call. The reflex is to bounce the customer back to a web form for the visual step, which puts you right back in the abandonment trap.

The pattern that works is a handoff. Voice collects what voice is good at. The phone camera handles the visual step. The voice line stays open through both, and the same session record threads the voice turn, the SMS turn, and the screening turn together. That is the architecture.

What Does CIP Actually Require?

Section 326 of the USA PATRIOT Act sets the floor. Every covered institution needs a written Customer Identification Program with risk-based procedures that verify identity to the extent reasonable and practicable, maintain records of the verification, and check the applicant against government lists of suspected terrorists. The minimum identifying data is name, date of birth, address, and an identification number, which for a US person is typically the SSN. None of that says anything about channel. A voice agent satisfies CIP if and only if it captures the four data points, runs the OFAC check, and writes durable records.

FinCEN's recent exceptive relief order kept the 25 percent beneficial ownership threshold but eased the verification cadence: financial institutions can verify beneficial owners at first account opening rather than every new account for the same customer. That is a real simplification for fintechs that open multiple products per customer, but it does not change anything about the first-account flow you are building here.

GLBA's Safeguards Rule sits on top of CIP. Customer information has to be encrypted at rest and in transit, MFA enforced, access role-based, and audit logging maintained across every system that touches NPI. BSA requires five-year retention of CIP records. Your voice agent is not exempt from any of this. If the agent collects an SSN, the SSN is NPI from the moment it leaves the customer's mouth, and your stack has to treat it that way. The same encryption-and-vault discipline shows up in HIPAA security for voice AI, with different acronyms but the same shape.

How Do You Hand Off Voice to a Phone Camera Without Losing the Customer?

You keep the voice line open and use SMS to ferry a single-use link to a mobile web flow that owns the camera. One session ID threads the voice turn, the SMS turn, and the verification turn together. Build that handoff first, because everything else hangs off it. The voice agent collects name, date of birth, and SSN in a normal back-and-forth. When it is time for the ID and selfie, the agent says something like "I'm going to text you a link to take a quick photo of your license. Let me know when you have it open." It then calls a tool that sends a single-use SMS link.

tools/send-id-capture-link.ts·typescript
import { randomBytes } from "node:crypto";
import twilio from "twilio";
import { redis } from "@/lib/redis";
 
const sms = twilio(process.env.TWILIO_SID!, process.env.TWILIO_TOKEN!);
 
// Tool: voice agent calls this when it's time for the camera step.
// sessionId is the ID of the live voice call. The web flow will use the
// same sessionId so the two channels share state.
export async function sendIdCaptureLink({
  sessionId,
  phoneE164,
}: {
  sessionId: string;
  phoneE164: string;
}) {
  const nonce = randomBytes(16).toString("hex");
  // Single-use, 10 minute TTL. The web app rejects expired or replayed links.
  await redis.set(`id-link:${nonce}`, sessionId, "EX", 600);
 
  const url = `https://onboard.example.com/id?n=${nonce}`;
  await sms.messages.create({
    from: process.env.TWILIO_FROM!,
    to: phoneE164,
    body: `Tap to take a photo of your ID and a quick selfie. Link expires in 10 minutes: ${url}`,
  });
 
  return { status: "sent", expiresInSec: 600 };
}

The customer taps the link. The mobile web app validates the nonce, looks up the sessionId, and starts a Persona inquiry (or Alloy, or Stripe Identity, the contract is similar). Persona walks the customer through capturing the front and back of the ID and a selfie that doubles as a liveness check. When Persona finishes, it fires a webhook back to your backend.

The voice line is still open. The agent can say "Looks like you've started the camera step. I'll wait." It cannot poll. It needs to block on a real signal that the verification finished, then resume. That is a webhook-await primitive.

tools/wait-for-id-result.ts·typescript
type Pending = { resolve: (v: PersonaResult) => void; timeoutHandle: NodeJS.Timeout };
const pending = new Map<string, Pending>();
 
// Voice agent tool: blocks until the Persona webhook arrives or timeout.
export function waitForIdResult(sessionId: string, timeoutMs = 180_000) {
  return new Promise<PersonaResult>((resolve, reject) => {
    const timeoutHandle = setTimeout(() => {
      pending.delete(sessionId);
      reject(new Error("id_capture_timeout"));
    }, timeoutMs);
    pending.set(sessionId, { resolve, timeoutHandle });
  });
}
 
// Webhook handler: Persona POSTs here when the inquiry transitions to a
// terminal state. We resolve the pending promise so the voice tool returns.
export async function handlePersonaWebhook(req: Request) {
  const evt = await req.json();
  const sessionId = evt.data.attributes["reference-id"]; // we set this when we
                                                         // created the inquiry
  const p = pending.get(sessionId);
  if (!p) return new Response("no pending", { status: 200 });
 
  clearTimeout(p.timeoutHandle);
  pending.delete(sessionId);
  p.resolve({
    status: evt.data.attributes.status,        // "completed" | "failed"
    inquiryId: evt.data.id,
    extracted: evt.data.attributes.fields,     // name, dob, address, doc number
  });
  return new Response("ok", { status: 200 });
}

That is the whole hard part of the handoff. The voice tool awaits a promise. The webhook resolves it. State lives in one map keyed by the voice session ID. In a multi-process deployment the map is Redis pub/sub, but the shape is the same.

I want to open an account collects name, DOB, SSN send_id_capture_link(sessionId) SMS with single-use link taps link, opens camera flow starts inquiry (sessionId as reference-id) liveness + ID extraction webhook: inquiry completed resolves waitForIdResult promise ofac_screen(applicantId) SDN + PEP query clean | possible_match | hard_match decision You're all set, account number ends in 4471 Customer Voice Agent Backend Phone (Web) KYC Vendor OFAC API
Voice + SMS + camera handoff: the voice line stays open while the customer's phone camera handles the visual step

Screen, Then Decide

Once the ID is verified, OFAC screening runs before the agent says anything else. SDN matches block the next utterance. The agent's tool gets one of three statuses back, and the prompt branches on that status. Do not let the model choose how to handle a hit. The branch belongs in code.

tools/ofac-screen.ts·typescript
type ScreenResult =
  | { status: "clean" }
  | { status: "possible_match"; score: number; matches: Match[] }
  | { status: "hard_match"; matches: Match[] };
 
export async function ofacScreen({
  applicantId,
}: {
  applicantId: string;
}): Promise<ScreenResult> {
  // Fetch PII from your KYC vault server-side. The applicantId is opaque to
  // the LLM. The vault token never enters the prompt.
  const { fullName, dob, address } = await vault.read(applicantId);
 
  const r = await fetch("https://api.complyadvantage.com/searches", {
    method: "POST",
    headers: {
      "Authorization": `Token ${process.env.COMPLYADVANTAGE_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      search_term: fullName,
      filters: { birth_year: new Date(dob).getFullYear() },
      fuzziness: 0.6,
      types: ["sanction", "pep", "adverse-media"],
    }),
  }).then((r) => r.json());
 
  // Decision tree, not a model judgment.
  if (r.data.total_hits === 0) return { status: "clean" };
 
  const sanctionsHit = r.data.hits.find(
    (h: Hit) => h.match_types.includes("sanction") && h.score >= 0.95,
  );
  if (sanctionsHit) {
    return { status: "hard_match", matches: r.data.hits };
  }
 
  return {
    status: "possible_match",
    score: r.data.hits[0].score,
    matches: r.data.hits,
  };
}

The agent prompt has three short branches. On clean, continue to tier decisioning. On possible_match, freeze the application, write a review-queue entry, tell the customer "your application is in review and we'll follow up by email within 24 hours," end the call. On hard_match, do not tell the customer they hit a sanctions list. Do say you cannot complete the application. Trigger the SAR review workflow internally. ComplyAdvantage and Refinitiv World-Check refresh their underlying lists multiple times an hour, which matters because OFAC adds entries on no schedule and the SDN list as published by Treasury is the legal source of truth.

Tier the Customer Before You Open the Account

CIP is a floor, not a ceiling. The CDD rule layers risk-based procedures on top, and enhanced due diligence applies to higher-risk customers including politically exposed persons, foreign correspondent accounts, and applicants whose occupation or funding source raises a flag. The agent prompt has a small decision table that captures the institution's risk policy.

tools/decide-tier.ts·typescript
type Tier = "basic" | "edd";
 
interface TierInputs {
  ofacResult: { status: string; score?: number };
  occupation: string;             // collected during voice
  fundingSource: string;          // collected during voice
  initialDepositCents: number;    // collected during voice
  fundingCountry: string;         // ISO-3166
}
 
export function decideTier(i: TierInputs): { tier: Tier; reasons: string[] } {
  const reasons: string[] = [];
 
  if (i.ofacResult.status === "possible_match") {
    reasons.push("ofac_possible_match");
  }
  if (PEP_OCCUPATIONS.includes(i.occupation.toLowerCase())) {
    reasons.push("pep_occupation");
  }
  if (i.fundingCountry !== "US") {
    reasons.push("foreign_funding");
  }
  if (i.initialDepositCents > 1_000_000_00) {
    reasons.push("high_initial_deposit");
  }
 
  return { tier: reasons.length > 0 ? "edd" : "basic", reasons };
}

Basic tier flows directly into your core banking system's account-open API. EDD does not. EDD applications stop the voice flow with a polite "we need a quick second-level review before we can open this account" and route to a human reviewer with senior management sign-off. The voice agent never auto-approves an EDD case. That is a policy line, not a UX choice.

Audit Logs That Survive an Exam

Every event needs to land in a tamper-evident log. The BSA five-year retention requirement is a floor, and your examiners will ask. The minimum entry has a precise timestamp, the actor (which agent, which tool), the action, the input hash, the output identifier from any vendor call, and the decision the agent made. Hash chaining is enough for most fintechs. AWS QLDB gives you a managed equivalent.

audit/append.ts·typescript
import { createHash } from "node:crypto";
 
interface AuditEntry {
  ts: string;                     // ISO-8601 with milliseconds
  sessionId: string;
  actor: string;                  // "voice-agent" | "ofac-screen" | "tier-decide"
  action: string;                 // "id_capture_completed" | "ofac_screened" | ...
  inputHash: string;              // sha256 of the action input
  vendorRef?: string;             // Persona inquiry id, ComplyAdvantage search id
  decision?: string;              // "approved" | "edd" | "review" | "rejected"
}
 
export async function appendAudit(
  entry: Omit<AuditEntry, "ts">,
  prevHash: string,
): Promise<{ recorded: AuditEntry; hash: string }> {
  const recorded: AuditEntry = { ...entry, ts: new Date().toISOString() };
  const payload = JSON.stringify(recorded) + prevHash;
  const hash = createHash("sha256").update(payload).digest("hex");
 
  await db.audit.insert({ ...recorded, prevHash, hash });
  return { recorded, hash };
}

The transcript of the voice call is part of the record because it shows how identity data was collected. Store it as an encrypted blob in object storage, reference the blob URI from the audit log, and never embed the audio inline. SSN, ID images, and selfie video go through the same vault-backed pattern. The audit log holds the references, not the data.

That covers the 70 percent that everyone building this has to solve. The remaining 30 percent is operational wiring: where the secrets live, how scorecards grade the agent, what scenarios run before a prompt change ships.

How Chanl Threads This Together

The voice tools above need three things to be production-grade. The vendor tokens cannot live in the LLM context. The agent has to be graded the same way every shift, every model upgrade, every Tuesday. New prompt versions have to be tested against adversarial personas before they touch a real customer. Chanl is the layer that gives you those three things without rebuilding them.

Tools register once, reference workspace-scoped secrets, and ship to the agent with the right contract. The Persona and ComplyAdvantage adapters are HTTP tools whose API tokens live in the secrets vault.

register-tools.ts·typescript
import { Chanl } from "@chanl/sdk";
 
const sdk = new Chanl({ apiKey: process.env.CHANL_API_KEY! });
 
await sdk.tools.create({
  name: "persona_create_inquiry",
  description: "Create a Persona identity inquiry for the live voice session.",
  type: "http",
  inputSchema: {
    type: "object",
    properties: {
      sessionId: { type: "string" },
      templateId: { type: "string" },
    },
    required: ["sessionId", "templateId"],
  },
  configuration: {
    http: {
      method: "POST",
      url: "https://withpersona.com/api/v1/inquiries",
      // Header is templated against the workspace secret. The token never
      // appears in the agent's context window.
      headers: { Authorization: "Bearer {{secret.PERSONA_API_KEY}}" },
    },
  },
});
 
await sdk.tools.create({
  name: "ofac_screen",
  description: "Screen the applicant against OFAC SDN, PEP, and adverse media.",
  type: "http",
  inputSchema: {
    type: "object",
    properties: { applicantId: { type: "string" } },
    required: ["applicantId"],
  },
  configuration: {
    http: {
      method: "POST",
      url: "https://api.complyadvantage.com/searches",
      headers: { Authorization: "Token {{secret.COMPLYADVANTAGE_KEY}}" },
    },
  },
});

Memory is the architecture decision that trips up most teams. Persistent agent memory is great for "remember the last call this customer had with us" and useless for SSNs. The rule we enforce: memory holds outcomes, not regulated identifiers. The KYC vault holds the raw data. The agent's memory entry for an onboarded customer reads completed_onboarding_2026_04_29 tier=basic vault_ref=vault://kyc/abc123. That is enough to greet a returning customer by name on call two and never enough to leak NPI into a transcript. We argue this split harder in privacy-first AI agent memory.

Scorecards turn "did the call go well" into a five-axis rubric the same shape every time.

score-call.ts·typescript
// axes the rubric grades: kyc_collected_in_order, liveness_completed,
// ofac_screened, tier_correct, audit_log_complete
const { data } = await sdk.scorecard.evaluate(callId, {
  scorecardId: "kyc-onboarding-v3",
});
 
if (data.scores.tier_correct < 0.9) {
  // Prompt or policy regression. Page the on-call.
}
Quality analyst reviewing scores
Score
Good
0/100
Tone & Empathy
94%
Resolution
88%
Response Time
72%
Compliance
85%

The scorecard runs on every call in production. The same scorecard runs on every scenario in pre-production. Scenarios replay adversarial personas against the current prompt, which is how you catch a regression before it ships.

adversarial-tests.ts·typescript
// PEP-flagged applicant: should always trigger EDD escalation, never auto-open.
const run = await sdk.scenarios.run("kyc-pep-flagged-applicant", {
  agentId: "onboarding-voice-v3",
});
 
// Hard OFAC match: should always reject and trigger SAR review.
await sdk.scenarios.run("kyc-hard-ofac-match", {
  agentId: "onboarding-voice-v3",
});
 
// Confused user can't complete camera step: should retry, then hand to human.
await sdk.scenarios.run("kyc-camera-step-failure", {
  agentId: "onboarding-voice-v3",
});

You run the scenario suite on every prompt change, every model bump, and every tool change. The output is a scorecard delta. If tier_correct drops on kyc-pep-flagged-applicant between v3 and v4, v4 does not ship. That is the regression gate. We dig into the same gate pattern in AI agent test coverage: how much is enough? and agent readiness testing.

What Still Has to Live in Your Stack?

Three pieces are not Chanl's job, by design.

The KYC vault is yours. Tokenize SSN, ID images, and selfie video at the moment of capture. The vault returns a token. The token is the only thing that crosses into the agent's tool calls.

The audit log is yours. Hash chain or QLDB. Five-year retention. Encrypted blob storage for the call recording, with a reference in the log entry rather than the bytes inline.

The core banking integration is yours. The account-open API, the funds-availability rules, the customer-of-record write into your CRM. Those are bank-specific and the vendor map is too varied to abstract well.

What you should not be rebuilding is the agent runtime around them. The tool registry with secret templating, the scorecard evaluator that runs the same rubric on every call, and the scenario runner that replays adversarial personas against the new prompt. Build, connect, and monitor maps onto the three things you would otherwise ship from scratch.

What Does This Actually Buy You?

A 12-step web form that drops 40 to 50 percent during KYC turns into a four-minute voice call where the customer talks through the easy data, points their phone camera at their ID, and walks away with an account number on the same call. The OFAC screen runs before the agent says anything that implies approval. The tier decision is a code path, not a model judgment. Every event lands in the audit log with a hash so an examiner can verify nothing was changed afterwards. None of it is theoretical. Every block above is code your team can adapt this week.

The harder piece to internalize is the line about memory. Agents that remember each customer are the right pattern for retention, cross-sell, and the second-call experience. They are the wrong pattern for storing an SSN. Once that line is drawn and enforced in tooling, the rest of the architecture falls out cleanly.

Build, connect, and monitor your KYC voice agent

Chanl ships the tool registry, scorecard rubrics, and adversarial scenarios so you can put a compliant voice agent in front of customers without rebuilding the runtime.

Start building
DG

Co-founder

Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.

The Signal Briefing

One email a week. How leading CS, revenue, and AI teams are turning conversations into decisions. Benchmarks, playbooks, and what's working in production.

500+ CS and revenue leaders subscribed

Frequently Asked Questions