Why Hand Off to a Phone Camera Instead of Just Collecting an ID Photo Over the Voice Channel?

Voice channels do not carry images. Even if your platform supports MMS, MMS quality is too low for OCR and liveness, and the customer has to fight a 1990s messaging UX while the agent waits silently. SMS with a single-use, short-TTL link to a mobile web flow lets the customer use their phone's native camera, gives the verification vendor a real WebRTC liveness session, and keeps the voice line open so the agent can guide the user in real time. The voice session ID and the web session ID are the same string, which is what makes the rejoin work.

What Does CIP Section 326 Actually Require for a Voice-Led Onboarding?

Section 326 of the USA PATRIOT Act requires the institution to verify the identity of any person opening an account, maintain records of the information used to verify identity, and check the person against government lists of suspected terrorists. The CIP must include risk-based procedures, and the same minimum data set applies regardless of channel: name, date of birth, address, and an identification number. A voice agent satisfies CIP only if it captures all four data points, runs the OFAC and other government list checks, and writes durable records of every step.

How Do I Run OFAC Screening Without Sending PII to the LLM Context?

Use a server-side tool. The voice agent calls something like `ofac_screen(applicant_id)` where applicant_id is an opaque token your backend already holds. Your backend retrieves the PII from your KYC vault, sends it to ComplyAdvantage, Refinitiv World-Check, or a similar vendor, and returns a structured result with `status`, `score`, and `matches`. The LLM sees the result, not the source data. Tokens for the vendor live in your secrets manager, never in the prompt.

What Goes in the Audit Log to Survive a BSA Exam?

Every CIP and CDD event with a precise timestamp, the hash of the input data, the vendor response identifier, and the decision the agent made. BSA requires five-year retention. A hash chain or AWS QLDB gives you tamper evidence so an examiner can verify nothing was edited after the fact. Voice transcripts and TTS recordings are part of the record because they show how identity data was collected, but the raw audio should reference an encrypted blob in object storage rather than living inline.

When Does an Applicant Need Enhanced Due Diligence?

Enhanced due diligence applies to higher-risk customers including politically exposed persons, foreign correspondent accounts, private banking relationships, and any applicant whose stated occupation or funding source flags your bank's risk policy. The voice agent's prompt should branch when the applicant says they are a foreign government official, that the funding is from outside the US, that the deposit is large relative to the bank's tier-one threshold, or when the OFAC screen returns a possible match. EDD typically requires senior management approval and additional documentation, which the agent should escalate rather than try to satisfy in-call.

Can I Store SSN in Agent Memory?

No. Persistent memory on an AI agent platform should hold conversational context and outcomes, not regulated identifiers. A typical pattern is to store the SSN in your KYC vault, return a token like `vault://ssn/abc123`, and write only the token plus a non-sensitive summary like `completed_onboarding_2026_04_29 tier=basic` to memory. Memory becomes part of the next call's context, so anything that lands there must be safe to render in transcripts and logs.

What Should I Test Before Putting This in Production?

Run adversarial scenarios with synthetic personas before any live call. A PEP-flagged applicant should always trigger EDD escalation. A hard OFAC match should always reject and trigger SAR review, never auto-open the account. A confused applicant who cannot complete the camera step should be retried, then handed to a human, never abandoned silently. Stress the SMS handoff with bad numbers, no-link clicks, and timeouts. Score every test run on the same five-axis rubric you use in production: data collection order, liveness completion, OFAC screening, tier correctness, audit log completeness.

Build a KYC Voice Agent: 4-Minute Account Open, 5-Year Audit Log

The web form kills your customers. Twelve fields, four screens, an ID upload that fails on the first try, and a "verifying your identity" spinner that has no end. Onboarding drop-off climbed from around 40 percent in 2016 to roughly 68 percent in 2022, and 40 to 50 percent of that drop happens specifically inside the KYC step. Once a flow runs longer than three to five minutes, abandonment crosses 50 percent. A better progress bar will not save you.

Voice changes the rhythm. A patient-sounding voice can collect name, date of birth, and an SSN in under 90 seconds. The catch is that voice cannot see. You cannot read a driver's license through a phone call. The reflex is to bounce the customer back to a web form for the visual step, which puts you right back in the abandonment trap.

The pattern that works is a handoff. Voice collects what voice is good at. The phone camera handles the visual step. The voice line stays open through both, and the same session record threads the voice turn, the SMS turn, and the screening turn together. That is the architecture.

What Does CIP Actually Require?

Section 326 of the USA PATRIOT Act sets the floor. Every covered institution needs a written Customer Identification Program with risk-based procedures that verify identity to the extent reasonable and practicable, maintain records of the verification, and check the applicant against government lists of suspected terrorists. The minimum identifying data is name, date of birth, address, and an identification number, which for a US person is typically the SSN. None of that says anything about channel. A voice agent satisfies CIP if and only if it captures the four data points, runs the OFAC check, and writes durable records.

FinCEN's recent exceptive relief order kept the 25 percent beneficial ownership threshold but eased the verification cadence: financial institutions can verify beneficial owners at first account opening rather than every new account for the same customer. That is a real simplification for fintechs that open multiple products per customer, but it does not change anything about the first-account flow you are building here.

GLBA's Safeguards Rule sits on top of CIP. Customer information has to be encrypted at rest and in transit, MFA enforced, access role-based, and audit logging maintained across every system that touches NPI. BSA requires five-year retention of CIP records. Your voice agent is not exempt from any of this. If the agent collects an SSN, the SSN is NPI from the moment it leaves the customer's mouth, and your stack has to treat it that way. The same encryption-and-vault discipline shows up in HIPAA security for voice AI, with different acronyms but the same shape.

How Do You Hand Off Voice to a Phone Camera Without Losing the Customer?

You keep the voice line open and use SMS to ferry a single-use link to a mobile web flow that owns the camera. One session ID threads the voice turn, the SMS turn, and the verification turn together. Build that handoff first, because everything else hangs off it. The voice agent collects name, date of birth, and SSN in a normal back-and-forth. When it is time for the ID and selfie, the agent says something like "I'm going to text you a link to take a quick photo of your license. Let me know when you have it open." It then calls a tool that sends a single-use SMS link.

tools/send-id-capture-link.ts·typescript

import { randomBytes } from "node:crypto";
import twilio from "twilio";
import { redis } from "@/lib/redis";
 
const sms = twilio(process.env.TWILIO_SID!, process.env.TWILIO_TOKEN!);
 
// Tool: voice agent calls this when it's time for the camera step.
// sessionId is the ID of the live voice call. The web flow will use the
// same sessionId so the two channels share state.
export async function sendIdCaptureLink({
  sessionId,
  phoneE164,
}: {
  sessionId: string;
  phoneE164: string;
}) {
  const nonce = randomBytes(16).toString("hex");
  // Single-use, 10 minute TTL. The web app rejects expired or replayed links.
  await redis.set(`id-link:${nonce}`, sessionId, "EX", 600);
 
  const url = `https://onboard.example.com/id?n=${nonce}`;
  await sms.messages.create({
    from: process.env.TWILIO_FROM!,
    to: phoneE164,
    body: `Tap to take a photo of your ID and a quick selfie. Link expires in 10 minutes: ${url}`,
  });
 
  return { status: "sent", expiresInSec: 600 };
}

The customer taps the link. The mobile web app validates the nonce, looks up the sessionId, and starts a Persona inquiry (or Alloy, or Stripe Identity, the contract is similar). Persona walks the customer through capturing the front and back of the ID and a selfie that doubles as a liveness check. When Persona finishes, it fires a webhook back to your backend.

The voice line is still open. The agent can say "Looks like you've started the camera step. I'll wait." It cannot poll. It needs to block on a real signal that the verification finished, then resume. That is a webhook-await primitive.

tools/wait-for-id-result.ts·typescript

type Pending = { resolve: (v: PersonaResult) => void; timeoutHandle: NodeJS.Timeout };
const pending = new Map<string, Pending>();
 
// Voice agent tool: blocks until the Persona webhook arrives or timeout.
export function waitForIdResult(sessionId: string, timeoutMs = 180_000) {
  return new Promise<PersonaResult>((resolve, reject) => {
    const timeoutHandle = setTimeout(() => {
      pending.delete(sessionId);
      reject(new Error("id_capture_timeout"));
    }, timeoutMs);
    pending.set(sessionId, { resolve, timeoutHandle });
  });
}
 
// Webhook handler: Persona POSTs here when the inquiry transitions to a
// terminal state. We resolve the pending promise so the voice tool returns.
export async function handlePersonaWebhook(req: Request) {
  const evt = await req.json();
  const sessionId = evt.data.attributes["reference-id"]; // we set this when we
                                                         // created the inquiry
  const p = pending.get(sessionId);
  if (!p) return new Response("no pending", { status: 200 });
 
  clearTimeout(p.timeoutHandle);
  pending.delete(sessionId);
  p.resolve({
    status: evt.data.attributes.status,        // "completed" | "failed"
    inquiryId: evt.data.id,
    extracted: evt.data.attributes.fields,     // name, dob, address, doc number
  });
  return new Response("ok", { status: 200 });
}

That is the whole hard part of the handoff. The voice tool awaits a promise. The webhook resolves it. State lives in one map keyed by the voice session ID. In a multi-process deployment the map is Redis pub/sub, but the shape is the same.

Voice + SMS + camera handoff: the voice line stays open while the customer's phone camera handles the visual step

Screen, Then Decide

Once the ID is verified, OFAC screening runs before the agent says anything else. SDN matches block the next utterance. The agent's tool gets one of three statuses back, and the prompt branches on that status. Do not let the model choose how to handle a hit. The branch belongs in code.

tools/ofac-screen.ts·typescript

type ScreenResult =
  | { status: "clean" }
  | { status: "possible_match"; score: number; matches: Match[] }
  | { status: "hard_match"; matches: Match[] };
 
export async function ofacScreen({
  applicantId,
}: {
  applicantId: string;
}): Promise<ScreenResult> {
  // Fetch PII from your KYC vault server-side. The applicantId is opaque to
  // the LLM. The vault token never enters the prompt.
  const { fullName, dob, address } = await vault.read(applicantId);
 
  const r = await fetch("https://api.complyadvantage.com/searches", {
    method: "POST",
    headers: {
      "Authorization": `Token ${process.env.COMPLYADVANTAGE_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      search_term: fullName,
      filters: { birth_year: new Date(dob).getFullYear() },
      fuzziness: 0.6,
      types: ["sanction", "pep", "adverse-media"],
    }),
  }).then((r) => r.json());
 
  // Decision tree, not a model judgment.
  if (r.data.total_hits === 0) return { status: "clean" };
 
  const sanctionsHit = r.data.hits.find(
    (h: Hit) => h.match_types.includes("sanction") && h.score >= 0.95,
  );
  if (sanctionsHit) {
    return { status: "hard_match", matches: r.data.hits };
  }
 
  return {
    status: "possible_match",
    score: r.data.hits[0].score,
    matches: r.data.hits,
  };
}

The agent prompt has three short branches. On clean, continue to tier decisioning. On possible_match, freeze the application, write a review-queue entry, tell the customer "your application is in review and we'll follow up by email within 24 hours," end the call. On hard_match, do not tell the customer they hit a sanctions list. Do say you cannot complete the application. Trigger the SAR review workflow internally. ComplyAdvantage and Refinitiv World-Check refresh their underlying lists multiple times an hour, which matters because OFAC adds entries on no schedule and the SDN list as published by Treasury is the legal source of truth.

Tier the Customer Before You Open the Account

CIP is a floor, not a ceiling. The CDD rule layers risk-based procedures on top, and enhanced due diligence applies to higher-risk customers including politically exposed persons, foreign correspondent accounts, and applicants whose occupation or funding source raises a flag. The agent prompt has a small decision table that captures the institution's risk policy.

tools/decide-tier.ts·typescript

type Tier = "basic" | "edd";
 
interface TierInputs {
  ofacResult: { status: string; score?: number };
  occupation: string;             // collected during voice
  fundingSource: string;          // collected during voice
  initialDepositCents: number;    // collected during voice
  fundingCountry: string;         // ISO-3166
}
 
export function decideTier(i: TierInputs): { tier: Tier; reasons: string[] } {
  const reasons: string[] = [];
 
  if (i.ofacResult.status === "possible_match") {
    reasons.push("ofac_possible_match");
  }
  if (PEP_OCCUPATIONS.includes(i.occupation.toLowerCase())) {
    reasons.push("pep_occupation");
  }
  if (i.fundingCountry !== "US") {
    reasons.push("foreign_funding");
  }
  if (i.initialDepositCents > 1_000_000_00) {
    reasons.push("high_initial_deposit");
  }
 
  return { tier: reasons.length > 0 ? "edd" : "basic", reasons };
}

Basic tier flows directly into your core banking system's account-open API. EDD does not. EDD applications stop the voice flow with a polite "we need a quick second-level review before we can open this account" and route to a human reviewer with senior management sign-off. The voice agent never auto-approves an EDD case. That is a policy line, not a UX choice.

Audit Logs That Survive an Exam

Every event needs to land in a tamper-evident log. The BSA five-year retention requirement is a floor, and your examiners will ask. The minimum entry has a precise timestamp, the actor (which agent, which tool), the action, the input hash, the output identifier from any vendor call, and the decision the agent made. Hash chaining is enough for most fintechs. AWS QLDB gives you a managed equivalent.

audit/append.ts·typescript

import { createHash } from "node:crypto";
 
interface AuditEntry {
  ts: string;                     // ISO-8601 with milliseconds
  sessionId: string;
  actor: string;                  // "voice-agent" | "ofac-screen" | "tier-decide"
  action: string;                 // "id_capture_completed" | "ofac_screened" | ...
  inputHash: string;              // sha256 of the action input
  vendorRef?: string;             // Persona inquiry id, ComplyAdvantage search id
  decision?: string;              // "approved" | "edd" | "review" | "rejected"
}
 
export async function appendAudit(
  entry: Omit<AuditEntry, "ts">,
  prevHash: string,
): Promise<{ recorded: AuditEntry; hash: string }> {
  const recorded: AuditEntry = { ...entry, ts: new Date().toISOString() };
  const payload = JSON.stringify(recorded) + prevHash;
  const hash = createHash("sha256").update(payload).digest("hex");
 
  await db.audit.insert({ ...recorded, prevHash, hash });
  return { recorded, hash };
}

The transcript of the voice call is part of the record because it shows how identity data was collected. Store it as an encrypted blob in object storage, reference the blob URI from the audit log, and never embed the audio inline. SSN, ID images, and selfie video go through the same vault-backed pattern. The audit log holds the references, not the data.

That covers the 70 percent that everyone building this has to solve. The remaining 30 percent is operational wiring: where the secrets live, how scorecards grade the agent, what scenarios run before a prompt change ships.

How Chanl Threads This Together

The voice tools above need three things to be production-grade. The vendor tokens cannot live in the LLM context. The agent has to be graded the same way every shift, every model upgrade, every Tuesday. New prompt versions have to be tested against adversarial personas before they touch a real customer. Chanl is the layer that gives you those three things without rebuilding them.

Tools register once, reference workspace-scoped secrets, and ship to the agent with the right contract. The Persona and ComplyAdvantage adapters are HTTP tools whose API tokens live in the secrets vault.

import { Chanl } from "@chanl/sdk";
 
const sdk = new Chanl({ apiKey: process.env.CHANL_API_KEY! });
 
await sdk.tools.create({
  name: "persona_create_inquiry",
  description: "Create a Persona identity inquiry for the live voice session.",
  type: "http",
  inputSchema: {
    type: "object",
    properties: {
      sessionId: { type: "string" },
      templateId: { type: "string" },
    },
    required: ["sessionId", "templateId"],
  },
  configuration: {
    http: {
      method: "POST",
      url: "https://withpersona.com/api/v1/inquiries",
      // Header is templated against the workspace secret. The token never
      // appears in the agent's context window.
      headers: { Authorization: "Bearer {{secret.PERSONA_API_KEY}}" },
    },
  },
});
 
await sdk.tools.create({
  name: "ofac_screen",
  description: "Screen the applicant against OFAC SDN, PEP, and adverse media.",
  type: "http",
  inputSchema: {
    type: "object",
    properties: { applicantId: { type: "string" } },
    required: ["applicantId"],
  },
  configuration: {
    http: {
      method: "POST",
      url: "https://api.complyadvantage.com/searches",
      headers: { Authorization: "Token {{secret.COMPLYADVANTAGE_KEY}}" },
    },
  },
});

Memory is the architecture decision that trips up most teams. Persistent agent memory is great for "remember the last call this customer had with us" and useless for SSNs. The rule we enforce: memory holds outcomes, not regulated identifiers. The KYC vault holds the raw data. The agent's memory entry for an onboarded customer reads completed_onboarding_2026_04_29 tier=basic vault_ref=vault://kyc/abc123. That is enough to greet a returning customer by name on call two and never enough to leak NPI into a transcript. We argue this split harder in privacy-first AI agent memory.

Scorecards turn "did the call go well" into a five-axis rubric the same shape every time.

score-call.ts·typescript

// axes the rubric grades: kyc_collected_in_order, liveness_completed,
// ofac_screened, tier_correct, audit_log_complete
const { data } = await sdk.scorecard.evaluate(callId, {
  scorecardId: "kyc-onboarding-v3",
});
 
if (data.scores.tier_correct < 0.9) {
  // Prompt or policy regression. Page the on-call.
}

Score

Good

0/100

Tone & Empathy

94%

Resolution

88%

Response Time

72%

Compliance

85%

The scorecard runs on every call in production. The same scorecard runs on every scenario in pre-production. Scenarios replay adversarial personas against the current prompt, which is how you catch a regression before it ships.

adversarial-tests.ts·typescript

// PEP-flagged applicant: should always trigger EDD escalation, never auto-open.
const run = await sdk.scenarios.run("kyc-pep-flagged-applicant", {
  agentId: "onboarding-voice-v3",
});
 
// Hard OFAC match: should always reject and trigger SAR review.
await sdk.scenarios.run("kyc-hard-ofac-match", {
  agentId: "onboarding-voice-v3",
});
 
// Confused user can't complete camera step: should retry, then hand to human.
await sdk.scenarios.run("kyc-camera-step-failure", {
  agentId: "onboarding-voice-v3",
});

You run the scenario suite on every prompt change, every model bump, and every tool change. The output is a scorecard delta. If tier_correct drops on kyc-pep-flagged-applicant between v3 and v4, v4 does not ship. That is the regression gate. We dig into the same gate pattern in AI agent test coverage: how much is enough? and agent readiness testing.

What Still Has to Live in Your Stack?

Three pieces are not Chanl's job, by design.

The KYC vault is yours. Tokenize SSN, ID images, and selfie video at the moment of capture. The vault returns a token. The token is the only thing that crosses into the agent's tool calls.

The audit log is yours. Hash chain or QLDB. Five-year retention. Encrypted blob storage for the call recording, with a reference in the log entry rather than the bytes inline.

The core banking integration is yours. The account-open API, the funds-availability rules, the customer-of-record write into your CRM. Those are bank-specific and the vendor map is too varied to abstract well.

What you should not be rebuilding is the agent runtime around them. The tool registry with secret templating, the scorecard evaluator that runs the same rubric on every call, and the scenario runner that replays adversarial personas against the new prompt. Build, connect, and monitor maps onto the three things you would otherwise ship from scratch.

What Does This Actually Buy You?

A 12-step web form that drops 40 to 50 percent during KYC turns into a four-minute voice call where the customer talks through the easy data, points their phone camera at their ID, and walks away with an account number on the same call. The OFAC screen runs before the agent says anything that implies approval. The tier decision is a code path, not a model judgment. Every event lands in the audit log with a hash so an examiner can verify nothing was changed afterwards. None of it is theoretical. Every block above is code your team can adapt this week.

The harder piece to internalize is the line about memory. Agents that remember each customer are the right pattern for retention, cross-sell, and the second-call experience. They are the wrong pattern for storing an SSN. Once that line is drawn and enforced in tooling, the rest of the architecture falls out cleanly.

Build, connect, and monitor your KYC voice agent

Chanl ships the tool registry, scorecard rubrics, and adversarial scenarios so you can put a compliant voice agent in front of customers without rebuilding the runtime.

Start building

Sources & References

Key Takeaway

Testing edge cases before production deployment can reduce customer complaints by 80% and prevent costly emergency fixes post-launch.

kyc compliance voice-agents fintech ofac bsa-aml onboarding customer-experience

Dean Grover

Co-founder

Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.

The Signal Briefing

One email a week. How leading CS, revenue, and AI teams are turning conversations into decisions. Benchmarks, playbooks, and what's working in production.