A patient calls a primary-care line, waits eight minutes on hold, and books a 15-minute appointment. That ratio is the entire pitch for voice AI in healthcare. Every vendor demo opens with it. Every demo also stops shortly after the agent says hello.
The hard part is not the greeting. It is the part where the agent verifies identity over a noisy phone line, finds an open slot in Epic, runs a 270 eligibility check against the payer, books the appointment, sends a confirmation that does not leak PHI, and writes everything back without ever letting Protected Health Information touch a non-BAA surface. Most builders ship the first 30 seconds and hand off the rest to a human. That is a half-system, and a half-system is what your patient access team is already paying $40 an hour to run.
This article walks through the actual build. The 70% you can do with general open tools, then the 30% where Chanl SDK methods replace the parts you would otherwise own forever. Compliance gates are inline, not an afterthought.
The Compliance Baseline Before Any Code
Every vendor in the call path needs a signed Business Associate Agreement before a single PHI byte moves. The fastest way to derail a healthcare voice project is to wire a great prototype against an LLM your legal team has not approved, then discover three months in that you cannot ship. Get the matrix right first.
The stack as of 2026, with the BAA status that matters:
| Layer | Provider | BAA available |
|---|---|---|
| Telephony | Twilio Voice + Programmable Messaging | Yes, on HIPAA-eligible products |
| STT | Deepgram (Aura-2 / Nova-3) | Yes, SOC 2 Type II + BAA |
| LLM (text) | Azure OpenAI text endpoints | Yes, under Microsoft BAA |
| LLM (text) | Anthropic Claude on AWS Bedrock | Yes, under AWS BAA |
| TTS | ElevenLabs Enterprise + Zero Retention Mode | Yes, with both flags engaged |
| TTS | Cartesia | Yes, with BAA |
One trap to flag early: the Azure OpenAI Realtime API (audio-in / audio-out) is still not in Microsoft's HIPAA-eligible scope as of 2025. If you build on Realtime and your compliance officer reads the small print, the project stops. The safer pattern today is a streaming chain (Deepgram STT, Azure OpenAI text, ElevenLabs TTS) over Pipecat or LiveKit, with each hop crossing a BAA-covered boundary.
- Signed BAA with telephony provider (Twilio HIPAA-eligible products)
- Signed BAA with STT vendor (Deepgram, AWS Transcribe Medical)
- Signed BAA with LLM provider, text only if using Azure OpenAI
- Signed BAA with TTS vendor, Enterprise tier + Zero Retention if ElevenLabs
- TLS in transit, AES-256 at rest on transcripts and recordings
- Audit log on every tool call (timestamp, agent, tool, hashed patient ID, result code)
- A2P 10DLC registered campaign for any patient SMS (Healthcare campaign type)
- Written identity verification procedure approved by Privacy Officer
- Documented escalation path with named on-call clinician
- Quarterly access review on transcript storage
That checklist is not a marketing list. Every item has been the reason a real voice-agent project failed a procurement review.
Identity Verification on a Phone Line
The agent's first job after hello is the hardest one to get right. HHS does not mandate a specific verification form, and the widely accepted standard is two non-public identifiers. Date of birth plus last four of the SSN is the combination most health systems land on, and full SSN should never be asked. The interesting part is what the agent does when the audio is wrong.
Phone numerals collide. "Fifteen" sounds like "fifty" on a noisy line. "Five" and "nine" trade places when the caller has a hearing aid feeding back. The agent has to handle that confusion explicitly. Read each digit back, ask for a yes, and on three failed attempts hand off to a human nurse with the partial transcript attached.
The verification flow as a Pipecat-style function. It runs early in the conversation and gates every tool call after.
import { z } from "zod";
const dobSchema = z.string().regex(/^\d{2}\/\d{2}\/\d{4}$/);
const ssnLast4Schema = z.string().regex(/^\d{4}$/);
export async function verifyIdentity(opts: {
said: { dob: string; last4: string };
patientLookup: (q: { dob: string; last4: string }) => Promise<{ mrn: string } | null>;
attempts: number;
speak: (msg: string) => Promise<void>;
}): Promise<{ ok: true; mrn: string } | { ok: false; reason: "lockout" | "no-match" }> {
const dob = dobSchema.safeParse(opts.said.dob);
const last4 = ssnLast4Schema.safeParse(opts.said.last4);
if (!dob.success || !last4.success) {
if (opts.attempts >= 2) return { ok: false, reason: "lockout" };
await opts.speak(`I heard ${spellDigits(opts.said.last4)}. Is that right?`);
return { ok: false, reason: "no-match" };
}
const match = await opts.patientLookup({ dob: dob.data, last4: last4.data });
if (!match) return { ok: false, reason: opts.attempts >= 2 ? "lockout" : "no-match" };
return { ok: true, mrn: match.mrn };
}
function spellDigits(s: string) {
return s.split("").join(" ");
}Two things matter here. First, the agent confirms digit by digit before claiming a match. Second, the lockout is in the application layer, not the LLM. Patient lockout cannot be hallucinated away by a model that decides to be helpful.
Pulling Slots From FHIR
The agent now has a verified MRN. Next it needs available slots. FHIR R4 exposes two resources for this: Schedule (the link from a slot to a practitioner and location) and Slot (the actual free time blocks). Most major EHRs (Epic, Oracle Health, Athena) implement these. Epic adds two custom operations that make the workflow much shorter: Appointment.$find returns candidate times that account for templates and existing bookings, and Appointment.$book commits one in a single deterministic call.
Authentication for backend FHIR access is SMART Backend Services. Your service registers a public key (JWK) with the EHR, signs a one-time JWT assertion per token request, and gets back an access token scoped to system/Slot.read and system/Appointment.write. No user is in the loop, which is exactly what you want for an agent that runs at 2am.
The slot search and booking, written as two ordinary HTTP calls. The token mint is collapsed for space; in production it is a separate function that caches the access token until it expires.
type Slot = {
resourceType: "Slot";
id: string;
start: string;
end: string;
status: "free" | "busy" | "busy-tentative";
schedule: { reference: string };
};
export async function searchSlots(opts: {
baseUrl: string;
token: string;
scheduleId: string;
fromIso: string;
count?: number;
}): Promise<Slot[]> {
const url = new URL(`${opts.baseUrl}/Slot`);
url.searchParams.set("schedule", `Schedule/${opts.scheduleId}`);
url.searchParams.set("start", `ge${opts.fromIso}`);
url.searchParams.set("status", "free");
url.searchParams.set("_count", String(opts.count ?? 5));
const res = await fetch(url, {
headers: {
Authorization: `Bearer ${opts.token}`,
Accept: "application/fhir+json",
},
});
if (!res.ok) throw new Error(`Slot search failed: ${res.status}`);
const bundle = await res.json();
return (bundle.entry ?? []).map((e: { resource: Slot }) => e.resource);
}
export async function bookAppointment(opts: {
baseUrl: string;
token: string;
patientId: string;
slotId: string;
reason: string;
}): Promise<{ id: string }> {
const body = {
resourceType: "Appointment",
status: "booked",
slot: [{ reference: `Slot/${opts.slotId}` }],
participant: [
{ actor: { reference: `Patient/${opts.patientId}` }, status: "accepted" },
],
reasonCode: [{ text: opts.reason }],
};
const res = await fetch(`${opts.baseUrl}/Appointment`, {
method: "POST",
headers: {
Authorization: `Bearer ${opts.token}`,
"Content-Type": "application/fhir+json",
},
body: JSON.stringify(body),
});
if (!res.ok) throw new Error(`Booking failed: ${res.status}`);
const created = await res.json();
return { id: created.id };
}The Slot payload is real FHIR R4 with a start, end, status, and a back-reference to the Schedule. The agent reads the first three free slots, narrates them ("I have 9:15 Tuesday morning, 1:30 Tuesday afternoon, or 8:30 Wednesday morning"), takes a confirmation, and posts the Appointment. On Epic specifically, replacing searchSlots with a single POST /Appointment/$find collapses the schedule template plus slot dance into one round trip, and $book returns a fully resolved Appointment in the response.
Even with that working, the booking is not safe yet. The agent has not checked whether the patient's insurance covers the visit.
The Eligibility Check the Agent Has to Run Before It Commits
X12 270 is the standardized request for a patient's insurance coverage. The 271 is the response. Running it before booking is the difference between a 90-second resolution and a phone call from billing four days later. The two clearinghouses most teams use are Availity and Stedi, both of which expose a JSON wrapper over the underlying X12. As of late 2025, CMS also requires a network IP from an eligibility request's point of origin to be included for Medicare HETS lookups.
The pattern is: the agent has the verified patient demographics, the chosen provider's NPI, and the planned service type code. It fires a 270, parses the 271 (covered, copay, requires referral, out of network), and either continues to booking or routes the patient down a different path.
type Eligibility =
| { kind: "covered"; copayCents: number; deductibleRemainingCents: number }
| { kind: "needs-referral"; pcpName?: string }
| { kind: "out-of-network" }
| { kind: "unknown" };
export async function checkEligibility(opts: {
apiKey: string;
payerId: string;
member: { firstName: string; lastName: string; dob: string; memberId: string };
provider: { npi: string };
serviceTypeCode: string; // X12 EB code, e.g. "30" general health coverage, "98" professional office visit
}): Promise<Eligibility> {
const res = await fetch("https://healthcare.us.stedi.com/2024-04-01/eligibility", {
method: "POST",
headers: {
Authorization: `Key ${opts.apiKey}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
controlNumber: crypto.randomUUID().slice(0, 9),
tradingPartnerServiceId: opts.payerId,
provider: { npi: opts.provider.npi, organizationName: "Clinic" },
subscriber: opts.member,
encounter: { serviceTypeCodes: [opts.serviceTypeCode] },
}),
});
if (!res.ok) return { kind: "unknown" };
const data = await res.json();
const benefit = data.benefitsInformation?.find((b: { code: string }) => b.code === "1");
if (!benefit) return { kind: "out-of-network" };
const copay = benefit.benefitAmount?.amount ? Math.round(benefit.benefitAmount.amount * 100) : 0;
if (data.referralRequired) return { kind: "needs-referral" };
return {
kind: "covered",
copayCents: copay,
deductibleRemainingCents: data.deductibleRemainingCents ?? 0,
};
}The branching that follows is where the agent earns its keep. On covered, narrate the copay and proceed to book. On needs-referral, offer to schedule the patient with their listed PCP first. On out-of-network, transfer to a human who can talk through self-pay. On unknown, do not guess. Hand off, every time. The cost of a wrong eligibility answer is far higher than the cost of one extra warm transfer.
One honest note about latency: a 271 round trip can take three to seven seconds even on a good day, and Medicaid payers can run longer. Fire the call early, while the patient is still confirming the slot they liked, so the answer is back by the time they say yes. Otherwise the agent stalls mid-sentence and the patient assumes it crashed.
Confirming Without Leaking PHI
The patient is booked. They want a confirmation. This is where the project usually quietly fails compliance review. A confirmation that says "your appointment with Dr. Park for hypertension follow-up is Tuesday at 9:15" packs a diagnosis, a clinician, a time, and an MRN's worth of identity into a single SMS. That is PHI. Carriers and clearinghouses see it. So does the patient's roommate looking at the lock screen.
The fix is mechanical. Send a generic confirmation that names the time and a portal link, and let the rest live in the secured patient portal. A2P 10DLC requires a registered Healthcare campaign type for any SMS to US patients, with explicit prior consent recorded and STOP handling working before traffic flows.
import twilio from "twilio";
const client = twilio(process.env.TWILIO_SID, process.env.TWILIO_TOKEN);
export async function sendConfirmation(opts: {
to: string; // E.164
appointmentId: string;
whenLocal: string; // e.g. "Tue Apr 30 at 9:15am"
portalDeepLink: string;
}) {
const body =
`Your appointment is confirmed for ${opts.whenLocal}. ` +
`Details and prep instructions: ${opts.portalDeepLink}. ` +
`Reply STOP to opt out. Std msg & data rates may apply.`;
return client.messages.create({
from: process.env.TWILIO_FROM_HEALTHCARE_10DLC!,
to: opts.to,
body,
});
}No diagnosis. No provider name. No reason. The portal is where the rest lives, behind the patient's own login.
The Escalation Path That Makes the Rest of This Safe
The agent cannot be the only line of defense. Three triggers must always exit to a human: identity verification fails three times, the eligibility clearinghouse is silent past a timeout, or the patient says any of a small list of red-flag phrases (chest pain, suicidal ideation, trouble breathing). Hand off must carry context. Starting the patient over after they already gave their name and DOB is the surest way to turn a "this AI is great" review into a "never again" one.
import twilio from "twilio";
import { createHash } from "crypto";
const client = twilio(process.env.TWILIO_SID, process.env.TWILIO_TOKEN);
export async function escalateToNurse(opts: {
callSid: string;
caseFile: { mrn: string; dobMasked: string; reason: string; transcriptUrl: string };
nurseQueueNumber: string;
slackWebhook: string;
}) {
await fetch(opts.slackWebhook, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
text: `:rotating_light: Escalation. MRN ${hash(opts.caseFile.mrn)} | reason: ${opts.caseFile.reason} | transcript: ${opts.caseFile.transcriptUrl}`,
}),
});
return client.calls(opts.callSid).update({
twiml: `<Response><Dial><Conference statusCallback="/twilio/conf-status">nurse-${opts.callSid}</Conference></Dial></Response>`,
});
}
function hash(mrn: string) {
return createHash("sha256").update(mrn).digest("hex").slice(0, 10);
}The Slack message uses a hashed MRN, not the real one. The transcript link is short-lived and access-controlled. The conference call gives the on-call nurse a real human-to-human handoff while the case file sits in their queue tool.
Everything above stands on its own. A team can build it from the docs cited at the bottom and ship a working agent. The remaining 30% is what you stop owning when you put a platform underneath it.
What Chanl Handles, in Real Methods
The agent platform does three things for healthcare voice that are tedious to build well: it executes tool calls server-side so secrets never reach the LLM, it stores per-customer memory the agent can recall on the next call, and it scores conversations against domain-specific axes so quality is measurable instead of vibes-based.
Start by registering each FHIR and eligibility endpoint as a tool. Tool execution happens inside the BAA-covered tier, with the API token resolved at call time from a secret reference, never inlined in a prompt.
import { ChanlClient } from "@chanl/sdk";
const sdk = new ChanlClient({ apiKey: process.env.CHANL_API_KEY! });
await sdk.tools.create({
name: "fhir_search_slots",
type: "http",
config: {
method: "GET",
url: "https://fhir.epic.com/api/FHIR/R4/Slot",
auth: { type: "bearer", secretRef: "epic_smart_backend_token" },
inputSchema: {
type: "object",
properties: {
scheduleId: { type: "string" },
fromIso: { type: "string" },
},
required: ["scheduleId", "fromIso"],
},
},
});
await sdk.tools.create({
name: "eligibility_270",
type: "http",
config: {
method: "POST",
url: "https://healthcare.us.stedi.com/2024-04-01/eligibility",
auth: { type: "bearer", secretRef: "stedi_api_key" },
},
});Group those tools into a toolset and attach it to the patient-access agent. The same toolset works for the voice agent, a chat fallback, and SMS-driven scheduling. One source of truth.
const toolset = await sdk.toolsets.create({
name: "patient-access-scheduling",
toolIds: ["fhir_search_slots", "fhir_book_appointment", "eligibility_270"],
});
await sdk.agents.update("patient-access-agent", {
toolsetIds: [toolset.id],
});Now the harder part: memory. Most agent platforms store conversation context as embeddings, which is great for personalization and disastrous for PHI. A healthcare-ready memory layer needs a per-write policy that decides what to redact, block, or hash before the embedding model sees it. Today most teams pre-redact in their own application code and pass already-clean strings into Memory. The pattern looks like this.
import { redactPhi } from "./redact";
const safeContent = redactPhi(
"Prefers afternoon appointments. Declines telehealth. Has hearing aid in left ear."
);
await sdk.memory.create({
entityType: "patient",
entityId: hash(patient.mrn), // never store raw MRN
content: safeContent,
metadata: { workspace: "scheduling", baaCovered: true },
});That redactPhi function is yours to write today. A first-party phiPolicy: "redact" | "block" | "hash" parameter on memory.create would push that into the platform, and based on conversations with health-system buyers, it is the single biggest blocker between a generic agent platform and one that closes deals in healthcare. Flagging it inline so future readers know what is real and what is roadmap.
Quality follows the same pattern. Scorecards evaluate a finished call against axes you define. For patient access the axes that matter look like this.
await sdk.scorecards.create({
name: "patient-access-quality",
axes: [
{ name: "identity_verified", weight: 0.25 },
{ name: "eligibility_checked", weight: 0.20 },
{ name: "phi_disclosure_minimized", weight: 0.20 },
{ name: "escalation_appropriate", weight: 0.15 },
{ name: "booking_completed", weight: 0.20 },
],
});
// after a call ends
const result = await sdk.scorecards.evaluate({
callId,
scorecardName: "patient-access-quality",
});Then run the same agent against Scenarios before each release. An elderly patient persona who is hard of hearing, a Spanish-speaking caller, a member whose insurance changed last week, a patient asking for a controlled substance. Each persona is a regression test, a way to ship a prompt change without finding out from a HIPAA breach notification that something drifted.
await sdk.scenarios.run({
agentId: "patient-access-agent",
personaIds: ["elderly-hard-of-hearing", "spanish-spoken-only", "lapsed-coverage"],
});The run reports identity-verification accuracy per persona, escalation rate, and the scorecard breakdown. That is the harness. None of it makes a model smarter. It makes the agent shippable in an environment where one wrong answer can be a regulatory event.
What Still Does Not Ship Out of the Box
Three gaps remain across the category, not just in any one platform. They are worth naming because pretending they are solved is how compliance reviews start failing.
| Gap | Why it matters | What teams do today |
|---|---|---|
| First-party PHI redaction in memory writes | Embeddings are forever; redaction has to happen before, not after | Pre-redact in app code with a regex + clinical NER pass |
| Workspace-level BAA-only routing flag | Easy to misconfigure a single tool to a non-BAA endpoint | Manual provider audit during deploy review |
| Audit log streaming to SIEM | SOC and Privacy expect Splunk or Sumo, not a vendor dashboard | Custom webhook plus log shipper |
None of these block shipping a working agent. All three slow procurement down. The teams that close fastest are the ones that build the redaction and the SIEM export themselves, in their own code, and submit them to their Privacy Officer with the rest of the architecture.
Closing
The agent that books appointments is not a chatbot with a phone number on it. It is a small distributed system that verifies identity, talks to the EHR over FHIR, talks to the payer over X12, sends SMS through a registered campaign, and exits cleanly to a human when any of those steps wobbles. The work pays for itself the first time a patient with a 15-minute appointment doesn't spend eight minutes on hold to book it.
The platform layer should let you skip the pieces that are not your differentiator: tool execution that keeps tokens out of LLMs, memory the agent can use on the next call, scorecards that turn quality into a number, scenarios that catch regressions before patients do. The agent is the hero. The platform handles the plumbing so the team can focus on the part that is actually clinical.
Ship a healthcare voice agent that survives a Privacy Officer review
Talk through your EHR, payer mix, and the gaps you have left to close. We will tell you what is real today and what is on our roadmap.
Talk to us- FHIR R4 Appointment resource (HL7)
- FHIR R4 Slot resource (HL7)
- Epic on FHIR scheduling specifications
- Building AI agents that integrate with Epic or Cerner (CapMinds)
- Availity HIPAA Transactions API (270/271), March 2025
- Stedi real-time eligibility check (270/271 JSON)
- SMART App Launch v2.2.0 — Backend Services
- ElevenLabs HIPAA documentation
- Deepgram voice AI agents 2026 buyers guide
- Azure OpenAI Realtime API HIPAA scope (Microsoft Q&A)
- Twilio building HIPAA compliant messaging applications
- A2P 10DLC for Healthcare 2025 (Doctible)
- AWS for Industries — Transform healthcare prior auth with AI agents
- UiPath agentic AI for healthcare prior authorization (ViVE 2026)
- AWS Industries — AI-driven healthcare scheduling on AWS
- Assort Health — voice AI for patient access (case study)
- HHS HIPAA identity verification FAQ
- Best practice is two identifiers (AccountableHQ)
- CMS HETS 270/271 companion guide (IP origin requirement)
Co-founder
Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.
The Signal Briefing
One email a week. How leading CS, revenue, and AI teams are turning conversations into decisions. Benchmarks, playbooks, and what's working in production.



