What Does It Actually Take to Make a Voice Scheduling Agent HIPAA Compliant?

Every vendor in the call path needs a signed BAA. That includes the telephony provider (Twilio HIPAA-eligible products), the speech-to-text vendor (Deepgram has a BAA on Aura-2), the LLM (Azure OpenAI text endpoints are in scope, Realtime audio is not as of 2025), the text-to-speech vendor (ElevenLabs Enterprise with Zero Retention Mode), and any storage layer. On top of that you need encryption at rest and in transit, audit logs on every tool call, role-based access on transcripts, and a written verification procedure. Skipping any link breaks the chain.

Why Is the Azure OpenAI Realtime API a Problem for Healthcare?

Microsoft's BAA covers Azure OpenAI text endpoints, but as of 2025 the Realtime API audio in/audio out functionality is still in preview and not in HIPAA-eligible scope. If you need real-time voice and want PHI to stay BAA-covered, the safer pattern today is a streaming chain (Deepgram STT plus Azure OpenAI text plus a BAA-covered TTS) rather than the single Realtime endpoint.

How Does an AI Agent Verify Patient Identity Over the Phone Without Violating HIPAA?

HHS does not mandate a specific form of verification. The widely accepted standard is two non-public identifiers, typically date of birth plus the last four of the SSN, or DOB plus an account number. The agent should confirm digit by digit, handle phonetic confusion like fifteen versus fifty, and lock out after three failed attempts with a hand-off to a human queue. Full SSN should never be requested over the phone.

Why Use FHIR Slot and Appointment Instead of a Custom EHR API?

FHIR R4 Slot and Appointment resources are the standard interface every major EHR exposes (Epic, Oracle Health/Cerner, Athena). Building against FHIR means the agent works across EHRs without bespoke per-vendor code. Epic adds custom operations Appointment.find and Appointment.book that wrap the schedule template and slot lookup into a single deterministic call, but the underlying resources stay portable.

What Is a 270/271 Eligibility Check and Why Does the Agent Need It?

X12 270 is the standard real-time request for a patient's insurance coverage; the 271 is the response. Running it before booking lets the agent tell the patient their copay, whether the visit needs a referral, and whether the chosen provider is in-network. Skipping the check pushes that work to the front desk after the patient hangs up, which is exactly the friction the agent was supposed to remove.

Can the Confirmation SMS Contain the Appointment Time and Provider Name?

Generic times are usually fine; provider name plus visit reason in the same message can be PHI depending on context. The safer pattern is to send a minimal confirmation that names the date, the time, and a portal link, and to let the patient see the rest in the secured portal. A2P 10DLC registration with healthcare campaign type is required for any SMS to US patients, with explicit prior consent and STOP handling.

When Should an Agent Escalate to a Human Nurse?

Any failure of identity verification, eligibility-check timeout, repeated symptom red flags during intake (chest pain, suicidal ideation, breathing difficulty), or explicit patient request. The escalation must transfer the call with full context attached, not start the patient over. A Twilio Dial Conference with statusCallback plus a Slack notification carrying a case file is a baseline pattern that has held up in production.

What Is the Biggest Gap Between a Generic Agent Platform and a Healthcare-Ready One?

Memory. Generic platforms store conversation context as embeddings, which is great for personalization and disastrous for PHI. A healthcare-ready memory layer needs a per-write policy that can redact, block, or hash sensitive fields like MRN and SSN before they ever reach the embedding model. Today most teams pre-redact in application code; first-party memory PHI policies are still a product gap across the category.

How to Build a Healthcare Appointment Voice Agent (FHIR, 270/271, HIPAA)

A patient calls a primary-care line, waits eight minutes on hold, and books a 15-minute appointment. That ratio is the entire pitch for voice AI in healthcare. Every vendor demo opens with it. Every demo also stops shortly after the agent says hello.

The hard part is not the greeting. It is the part where the agent verifies identity over a noisy phone line, finds an open slot in Epic, runs a 270 eligibility check against the payer, books the appointment, sends a confirmation that does not leak PHI, and writes everything back without ever letting Protected Health Information touch a non-BAA surface. Most builders ship the first 30 seconds and hand off the rest to a human. That is a half-system, and a half-system is what your patient access team is already paying $40 an hour to run.

This article walks through the actual build. The 70% you can do with general open tools, then the 30% where Chanl SDK methods replace the parts you would otherwise own forever. Compliance gates are inline, not an afterthought.

The Compliance Baseline Before Any Code

Every vendor in the call path needs a signed Business Associate Agreement before a single PHI byte moves. The fastest way to derail a healthcare voice project is to wire a great prototype against an LLM your legal team has not approved, then discover three months in that you cannot ship. Get the matrix right first.

The stack as of 2026, with the BAA status that matters:

Layer	Provider	BAA available
Telephony	Twilio Voice + Programmable Messaging	Yes, on HIPAA-eligible products
STT	Deepgram (Aura-2 / Nova-3)	Yes, SOC 2 Type II + BAA
LLM (text)	Azure OpenAI text endpoints	Yes, under Microsoft BAA
LLM (text)	Anthropic Claude on AWS Bedrock	Yes, under AWS BAA
TTS	ElevenLabs Enterprise + Zero Retention Mode	Yes, with both flags engaged
TTS	Cartesia	Yes, with BAA

One trap to flag early: the Azure OpenAI Realtime API (audio-in / audio-out) is still not in Microsoft's HIPAA-eligible scope as of 2025. If you build on Realtime and your compliance officer reads the small print, the project stops. The safer pattern today is a streaming chain (Deepgram STT, Azure OpenAI text, ElevenLabs TTS) over Pipecat or LiveKit, with each hop crossing a BAA-covered boundary.

Progress0/10

That checklist is not a marketing list. Every item has been the reason a real voice-agent project failed a procurement review.

Identity Verification on a Phone Line

The agent's first job after hello is the hardest one to get right. HHS does not mandate a specific verification form, and the widely accepted standard is two non-public identifiers. Date of birth plus last four of the SSN is the combination most health systems land on, and full SSN should never be asked. The interesting part is what the agent does when the audio is wrong.

Phone numerals collide. "Fifteen" sounds like "fifty" on a noisy line. "Five" and "nine" trade places when the caller has a hearing aid feeding back. The agent has to handle that confusion explicitly. Read each digit back, ask for a yes, and on three failed attempts hand off to a human nurse with the partial transcript attached.

The verification flow as a Pipecat-style function. It runs early in the conversation and gates every tool call after.

verify-identity.ts·typescript

import { z } from "zod";
 
const dobSchema = z.string().regex(/^\d{2}\/\d{2}\/\d{4}$/);
const ssnLast4Schema = z.string().regex(/^\d{4}$/);
 
export async function verifyIdentity(opts: {
  said: { dob: string; last4: string };
  patientLookup: (q: { dob: string; last4: string }) => Promise<{ mrn: string } | null>;
  attempts: number;
  speak: (msg: string) => Promise<void>;
}): Promise<{ ok: true; mrn: string } | { ok: false; reason: "lockout" | "no-match" }> {
  const dob = dobSchema.safeParse(opts.said.dob);
  const last4 = ssnLast4Schema.safeParse(opts.said.last4);
  if (!dob.success || !last4.success) {
    if (opts.attempts >= 2) return { ok: false, reason: "lockout" };
    await opts.speak(`I heard ${spellDigits(opts.said.last4)}. Is that right?`);
    return { ok: false, reason: "no-match" };
  }
  const match = await opts.patientLookup({ dob: dob.data, last4: last4.data });
  if (!match) return { ok: false, reason: opts.attempts >= 2 ? "lockout" : "no-match" };
  return { ok: true, mrn: match.mrn };
}
 
function spellDigits(s: string) {
  return s.split("").join(" ");
}

Two things matter here. First, the agent confirms digit by digit before claiming a match. Second, the lockout is in the application layer, not the LLM. Patient lockout cannot be hallucinated away by a model that decides to be helpful.

Pulling Slots From FHIR

The agent now has a verified MRN. Next it needs available slots. FHIR R4 exposes two resources for this: Schedule (the link from a slot to a practitioner and location) and Slot (the actual free time blocks). Most major EHRs (Epic, Oracle Health, Athena) implement these. Epic adds two custom operations that make the workflow much shorter: Appointment.$find returns candidate times that account for templates and existing bookings, and Appointment.$book commits one in a single deterministic call.

Authentication for backend FHIR access is SMART Backend Services. Your service registers a public key (JWK) with the EHR, signs a one-time JWT assertion per token request, and gets back an access token scoped to system/Slot.read and system/Appointment.write. No user is in the loop, which is exactly what you want for an agent that runs at 2am.

The slot search and booking, written as two ordinary HTTP calls. The token mint is collapsed for space; in production it is a separate function that caches the access token until it expires.

fhir-scheduling.ts·typescript

type Slot = {
  resourceType: "Slot";
  id: string;
  start: string;
  end: string;
  status: "free" | "busy" | "busy-tentative";
  schedule: { reference: string };
};
 
export async function searchSlots(opts: {
  baseUrl: string;
  token: string;
  scheduleId: string;
  fromIso: string;
  count?: number;
}): Promise<Slot[]> {
  const url = new URL(`${opts.baseUrl}/Slot`);
  url.searchParams.set("schedule", `Schedule/${opts.scheduleId}`);
  url.searchParams.set("start", `ge${opts.fromIso}`);
  url.searchParams.set("status", "free");
  url.searchParams.set("_count", String(opts.count ?? 5));
 
  const res = await fetch(url, {
    headers: {
      Authorization: `Bearer ${opts.token}`,
      Accept: "application/fhir+json",
    },
  });
  if (!res.ok) throw new Error(`Slot search failed: ${res.status}`);
  const bundle = await res.json();
  return (bundle.entry ?? []).map((e: { resource: Slot }) => e.resource);
}
 
export async function bookAppointment(opts: {
  baseUrl: string;
  token: string;
  patientId: string;
  slotId: string;
  reason: string;
}): Promise<{ id: string }> {
  const body = {
    resourceType: "Appointment",
    status: "booked",
    slot: [{ reference: `Slot/${opts.slotId}` }],
    participant: [
      { actor: { reference: `Patient/${opts.patientId}` }, status: "accepted" },
    ],
    reasonCode: [{ text: opts.reason }],
  };
  const res = await fetch(`${opts.baseUrl}/Appointment`, {
    method: "POST",
    headers: {
      Authorization: `Bearer ${opts.token}`,
      "Content-Type": "application/fhir+json",
    },
    body: JSON.stringify(body),
  });
  if (!res.ok) throw new Error(`Booking failed: ${res.status}`);
  const created = await res.json();
  return { id: created.id };
}

The Slot payload is real FHIR R4 with a start, end, status, and a back-reference to the Schedule. The agent reads the first three free slots, narrates them ("I have 9:15 Tuesday morning, 1:30 Tuesday afternoon, or 8:30 Wednesday morning"), takes a confirmation, and posts the Appointment. On Epic specifically, replacing searchSlots with a single POST /Appointment/$find collapses the schedule template plus slot dance into one round trip, and $book returns a fully resolved Appointment in the response.

Even with that working, the booking is not safe yet. The agent has not checked whether the patient's insurance covers the visit.

The Eligibility Check the Agent Has to Run Before It Commits

X12 270 is the standardized request for a patient's insurance coverage. The 271 is the response. Running it before booking is the difference between a 90-second resolution and a phone call from billing four days later. The two clearinghouses most teams use are Availity and Stedi, both of which expose a JSON wrapper over the underlying X12. As of late 2025, CMS also requires a network IP from an eligibility request's point of origin to be included for Medicare HETS lookups.

The pattern is: the agent has the verified patient demographics, the chosen provider's NPI, and the planned service type code. It fires a 270, parses the 271 (covered, copay, requires referral, out of network), and either continues to booking or routes the patient down a different path.

eligibility.ts·typescript

type Eligibility =
  | { kind: "covered"; copayCents: number; deductibleRemainingCents: number }
  | { kind: "needs-referral"; pcpName?: string }
  | { kind: "out-of-network" }
  | { kind: "unknown" };
 
export async function checkEligibility(opts: {
  apiKey: string;
  payerId: string;
  member: { firstName: string; lastName: string; dob: string; memberId: string };
  provider: { npi: string };
  serviceTypeCode: string; // X12 EB code, e.g. "30" general health coverage, "98" professional office visit
}): Promise<Eligibility> {
  const res = await fetch("https://healthcare.us.stedi.com/2024-04-01/eligibility", {
    method: "POST",
    headers: {
      Authorization: `Key ${opts.apiKey}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      controlNumber: crypto.randomUUID().slice(0, 9),
      tradingPartnerServiceId: opts.payerId,
      provider: { npi: opts.provider.npi, organizationName: "Clinic" },
      subscriber: opts.member,
      encounter: { serviceTypeCodes: [opts.serviceTypeCode] },
    }),
  });
  if (!res.ok) return { kind: "unknown" };
 
  const data = await res.json();
  const benefit = data.benefitsInformation?.find((b: { code: string }) => b.code === "1");
  if (!benefit) return { kind: "out-of-network" };
  const copay = benefit.benefitAmount?.amount ? Math.round(benefit.benefitAmount.amount * 100) : 0;
  if (data.referralRequired) return { kind: "needs-referral" };
  return {
    kind: "covered",
    copayCents: copay,
    deductibleRemainingCents: data.deductibleRemainingCents ?? 0,
  };
}

The branching that follows is where the agent earns its keep. On covered, narrate the copay and proceed to book. On needs-referral, offer to schedule the patient with their listed PCP first. On out-of-network, transfer to a human who can talk through self-pay. On unknown, do not guess. Hand off, every time. The cost of a wrong eligibility answer is far higher than the cost of one extra warm transfer.

One honest note about latency: a 271 round trip can take three to seven seconds even on a good day, and Medicaid payers can run longer. Fire the call early, while the patient is still confirming the slot they liked, so the answer is back by the time they say yes. Otherwise the agent stalls mid-sentence and the patient assumes it crashed.

Confirming Without Leaking PHI

The patient is booked. They want a confirmation. This is where the project usually quietly fails compliance review. A confirmation that says "your appointment with Dr. Park for hypertension follow-up is Tuesday at 9:15" packs a diagnosis, a clinician, a time, and an MRN's worth of identity into a single SMS. That is PHI. Carriers and clearinghouses see it. So does the patient's roommate looking at the lock screen.

The fix is mechanical. Send a generic confirmation that names the time and a portal link, and let the rest live in the secured patient portal. A2P 10DLC requires a registered Healthcare campaign type for any SMS to US patients, with explicit prior consent recorded and STOP handling working before traffic flows.

confirm-sms.ts·typescript

import twilio from "twilio";
 
const client = twilio(process.env.TWILIO_SID, process.env.TWILIO_TOKEN);
 
export async function sendConfirmation(opts: {
  to: string; // E.164
  appointmentId: string;
  whenLocal: string; // e.g. "Tue Apr 30 at 9:15am"
  portalDeepLink: string;
}) {
  const body =
    `Your appointment is confirmed for ${opts.whenLocal}. ` +
    `Details and prep instructions: ${opts.portalDeepLink}. ` +
    `Reply STOP to opt out. Std msg & data rates may apply.`;
  return client.messages.create({
    from: process.env.TWILIO_FROM_HEALTHCARE_10DLC!,
    to: opts.to,
    body,
  });
}

No diagnosis. No provider name. No reason. The portal is where the rest lives, behind the patient's own login.

The Escalation Path That Makes the Rest of This Safe

The agent cannot be the only line of defense. Three triggers must always exit to a human: identity verification fails three times, the eligibility clearinghouse is silent past a timeout, or the patient says any of a small list of red-flag phrases (chest pain, suicidal ideation, trouble breathing). Hand off must carry context. Starting the patient over after they already gave their name and DOB is the surest way to turn a "this AI is great" review into a "never again" one.

escalate.ts·typescript

import twilio from "twilio";
import { createHash } from "crypto";
 
const client = twilio(process.env.TWILIO_SID, process.env.TWILIO_TOKEN);
 
export async function escalateToNurse(opts: {
  callSid: string;
  caseFile: { mrn: string; dobMasked: string; reason: string; transcriptUrl: string };
  nurseQueueNumber: string;
  slackWebhook: string;
}) {
  await fetch(opts.slackWebhook, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      text: `:rotating_light: Escalation. MRN ${hash(opts.caseFile.mrn)} | reason: ${opts.caseFile.reason} | transcript: ${opts.caseFile.transcriptUrl}`,
    }),
  });
  return client.calls(opts.callSid).update({
    twiml: `<Response><Dial><Conference statusCallback="/twilio/conf-status">nurse-${opts.callSid}</Conference></Dial></Response>`,
  });
}
 
function hash(mrn: string) {
  return createHash("sha256").update(mrn).digest("hex").slice(0, 10);
}

The Slack message uses a hashed MRN, not the real one. The transcript link is short-lived and access-controlled. The conference call gives the on-call nurse a real human-to-human handoff while the case file sits in their queue tool.

Everything above stands on its own. A team can build it from the docs cited at the bottom and ship a working agent. The remaining 30% is what you stop owning when you put a platform underneath it.

What Chanl Handles, in Real Methods

The agent platform does three things for healthcare voice that are tedious to build well: it executes tool calls server-side so secrets never reach the LLM, it stores per-customer memory the agent can recall on the next call, and it scores conversations against domain-specific axes so quality is measurable instead of vibes-based.

Start by registering each FHIR and eligibility endpoint as a tool. Tool execution happens inside the BAA-covered tier, with the API token resolved at call time from a secret reference, never inlined in a prompt.

import { ChanlClient } from "@chanl/sdk";
 
const sdk = new ChanlClient({ apiKey: process.env.CHANL_API_KEY! });
 
await sdk.tools.create({
  name: "fhir_search_slots",
  type: "http",
  config: {
    method: "GET",
    url: "https://fhir.epic.com/api/FHIR/R4/Slot",
    auth: { type: "bearer", secretRef: "epic_smart_backend_token" },
    inputSchema: {
      type: "object",
      properties: {
        scheduleId: { type: "string" },
        fromIso: { type: "string" },
      },
      required: ["scheduleId", "fromIso"],
    },
  },
});
 
await sdk.tools.create({
  name: "eligibility_270",
  type: "http",
  config: {
    method: "POST",
    url: "https://healthcare.us.stedi.com/2024-04-01/eligibility",
    auth: { type: "bearer", secretRef: "stedi_api_key" },
  },
});

Group those tools into a toolset and attach it to the patient-access agent. The same toolset works for the voice agent, a chat fallback, and SMS-driven scheduling. One source of truth.

toolset.ts·typescript

const toolset = await sdk.toolsets.create({
  name: "patient-access-scheduling",
  toolIds: ["fhir_search_slots", "fhir_book_appointment", "eligibility_270"],
});
 
await sdk.agents.update("patient-access-agent", {
  toolsetIds: [toolset.id],
});

Now the harder part: memory. Most agent platforms store conversation context as embeddings, which is great for personalization and disastrous for PHI. A healthcare-ready memory layer needs a per-write policy that decides what to redact, block, or hash before the embedding model sees it. Today most teams pre-redact in their own application code and pass already-clean strings into Memory. The pattern looks like this.

memory-with-redaction.ts·typescript

import { redactPhi } from "./redact";
 
const safeContent = redactPhi(
  "Prefers afternoon appointments. Declines telehealth. Has hearing aid in left ear."
);
 
await sdk.memory.create({
  entityType: "patient",
  entityId: hash(patient.mrn), // never store raw MRN
  content: safeContent,
  metadata: { workspace: "scheduling", baaCovered: true },
});

That redactPhi function is yours to write today. A first-party phiPolicy: "redact" | "block" | "hash" parameter on memory.create would push that into the platform, and based on conversations with health-system buyers, it is the single biggest blocker between a generic agent platform and one that closes deals in healthcare. Flagging it inline so future readers know what is real and what is roadmap.

Quality follows the same pattern. Scorecards evaluate a finished call against axes you define. For patient access the axes that matter look like this.

scorecard.ts·typescript

await sdk.scorecards.create({
  name: "patient-access-quality",
  axes: [
    { name: "identity_verified", weight: 0.25 },
    { name: "eligibility_checked", weight: 0.20 },
    { name: "phi_disclosure_minimized", weight: 0.20 },
    { name: "escalation_appropriate", weight: 0.15 },
    { name: "booking_completed", weight: 0.20 },
  ],
});
 
// after a call ends
const result = await sdk.scorecards.evaluate({
  callId,
  scorecardName: "patient-access-quality",
});

Then run the same agent against Scenarios before each release. An elderly patient persona who is hard of hearing, a Spanish-speaking caller, a member whose insurance changed last week, a patient asking for a controlled substance. Each persona is a regression test, a way to ship a prompt change without finding out from a HIPAA breach notification that something drifted.

scenarios.ts·typescript

await sdk.scenarios.run({
  agentId: "patient-access-agent",
  personaIds: ["elderly-hard-of-hearing", "spanish-spoken-only", "lapsed-coverage"],
});

The run reports identity-verification accuracy per persona, escalation rate, and the scorecard breakdown. That is the harness. None of it makes a model smarter. It makes the agent shippable in an environment where one wrong answer can be a regulatory event.

What Still Does Not Ship Out of the Box

Three gaps remain across the category, not just in any one platform. They are worth naming because pretending they are solved is how compliance reviews start failing.

Gap	Why it matters	What teams do today
First-party PHI redaction in memory writes	Embeddings are forever; redaction has to happen before, not after	Pre-redact in app code with a regex + clinical NER pass
Workspace-level BAA-only routing flag	Easy to misconfigure a single tool to a non-BAA endpoint	Manual provider audit during deploy review
Audit log streaming to SIEM	SOC and Privacy expect Splunk or Sumo, not a vendor dashboard	Custom webhook plus log shipper

None of these block shipping a working agent. All three slow procurement down. The teams that close fastest are the ones that build the redaction and the SIEM export themselves, in their own code, and submit them to their Privacy Officer with the rest of the architecture.

Closing

The agent that books appointments is not a chatbot with a phone number on it. It is a small distributed system that verifies identity, talks to the EHR over FHIR, talks to the payer over X12, sends SMS through a registered campaign, and exits cleanly to a human when any of those steps wobbles. The work pays for itself the first time a patient with a 15-minute appointment doesn't spend eight minutes on hold to book it.

The platform layer should let you skip the pieces that are not your differentiator: tool execution that keeps tokens out of LLMs, memory the agent can use on the next call, scorecards that turn quality into a number, scenarios that catch regressions before patients do. The agent is the hero. The platform handles the plumbing so the team can focus on the part that is actually clinical.

Ship a healthcare voice agent that survives a Privacy Officer review

Talk through your EHR, payer mix, and the gaps you have left to close. We will tell you what is real today and what is on our roadmap.

Talk to us

Sources & References

Key Takeaway

Testing edge cases before production deployment can reduce customer complaints by 80% and prevent costly emergency fixes post-launch.

hipaa voice compliance fhir healthcare tools memory scorecards

Dean Grover

Co-founder

Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.

The Signal Briefing

One email a week. How leading CS, revenue, and AI teams are turning conversations into decisions. Benchmarks, playbooks, and what's working in production.