Can an AI Agent Legally Do Nurse Triage in the US?

An AI agent can collect symptoms, walk validated triage protocols, and route to a human nurse, but it cannot legally render a diagnosis or treatment plan in most US states. Corporate practice of medicine doctrines and California AB 489 (October 2025) prohibit AI from holding itself out as a licensed clinician. The agent assists; the nurse decides.

What Clinical Protocol Should a Triage Agent Use?

Schmitt-Thompson Clinical Content covers 95% of medical triage calls in North America (over 25 million calls per year) and is updated annually against AAP, ACEP, and CDC guidelines. Manchester Triage System (MTS) is the equivalent standard in the UK and Europe. Both are decision trees, which is exactly what you want. Load them as data, not as LLM instructions.

How Do You Stop the Agent From Missing Red-Flag Emergencies?

Run a separate red-flag interrupt monitor on every transcript turn. If the patient says 'chest pain', 'can't breathe', 'worst headache of my life', or other escalation phrases, transfer to 911 advisory and a human nurse immediately. Do not let the LLM decide. The monitor is a deterministic regex-and-rules check that runs in parallel with the conversation.

Is an AI Symptom Triage Tool an FDA-Regulated Medical Device?

It depends on what the agent says at the end. Information-only outputs (call your doctor, here is the office hours) generally fall under the Cures Act clinical decision support exemption. Disposition recommendations (go to the ED now) edge into Software as a Medical Device (SaMD) territory. The FDA's January 2025 draft guidance on AI lifecycle management is the current authority. When in doubt, route to a human.

What Does HIPAA Require for AI Nurse Triage?

A Business Associate Agreement with every vendor handling PHI, TLS 1.2+ encryption in transit, AES-256 at rest, role-based access on transcripts, and audit logs that survive a third-phase OCR audit. The 2025 HHS proposed rule explicitly requires AI tools be included in your HIPAA risk analysis.

What Goes in the Audit Log for Malpractice Defense?

Every disposition needs to cite the protocol section it came from. The log should record: full transcript, fields the protocol asked for, fields the patient answered, the protocol section referenced, the disposition emitted, and any red-flag triggers. If a patient codes overnight, the audit log is your defense. It shows the agent followed published clinical evidence, not LLM intuition.

What AI Features Should a Triage Agent NOT Have?

Do not let the agent self-diagnose, recommend medications, override a protocol disposition, or remember clinical history across calls. Long-term clinical memory belongs in the EHR via FHIR, not in the agent. The agent is a transient symptom collector. Anything else creates malpractice exposure and probably violates corporate practice of medicine rules.

Building an AI Nurse Line Without Practicing Medicine

It's 2:47 AM. A parent has a feverish toddler. They call the after-hours nurse line.

That call costs the health system somewhere between $15 and $30 to handle, depending on whether it goes to an in-house team or an outsourced vendor. Schmitt-Thompson, the protocol library used by 95% of North American medical triage call centers, processes over 25 million of these calls a year. The volume is enormous, the margin is thin, and the obvious move is to put an AI agent in front of the nurse.

But there is a question every digital health builder hits the moment they sketch the architecture: what happens when the agent says "this can wait until morning" and the patient codes overnight? Who gets sued? The agent vendor? The health system? The on-call physician whose name is on the answering service?

The answer is "yes, all of them, and you'll spend two years finding out which one writes the check." So the architecture has to do something subtle. It has to use the agent for what agents are good at (collecting symptoms, asking the next question, never getting tired at 2:47 AM) and ruthlessly keep the agent away from anything that looks like clinical judgment. That line is where this article lives.

Don't Let the LLM Freelance

The first instinct most teams have is to write a system prompt that says something like "You are a nurse triage assistant. Ask the patient about their symptoms and recommend the appropriate level of care." This is the wrong instinct. There are 50 years of clinical evidence built into telephone triage protocols. Schmitt-Thompson is updated annually against AAP, ACEP, ACOG, AHA, and CDC guidelines, peer-reviewed by 200+ active clinicians. Manchester Triage is the UK and European equivalent, used by NHS ambulance services since 2006 for the secondary assessment of non-immediate 999 calls.

These protocols are decision trees. Each chief complaint (sore throat, chest pain, infant fever, abdominal pain) has a sequence of required questions that branch into red, yellow, or green dispositions. The protocol is not "advice." It is the standard of care.

Load the protocol as structured data, then make the LLM's job to fill in the tree's required fields. Not invent a recommendation.

protocols/sore-throat.json·json

{
  "id": "schmitt-thompson-sore-throat-v2025-04",
  "chief_complaint": "Sore Throat",
  "required_fields": [
    { "key": "age_months", "type": "number" },
    { "key": "fever_temp_f", "type": "number" },
    { "key": "duration_hours", "type": "number" },
    { "key": "drooling", "type": "boolean" },
    { "key": "trouble_swallowing", "type": "boolean" },
    { "key": "trouble_breathing", "type": "boolean" }
  ],
  "rules": [
    { "if": "trouble_breathing == true", "disposition": "CALL_911", "citation": "STCC Sore Throat §1.1" },
    { "if": "drooling == true && age_months >= 12", "disposition": "GO_TO_ED_NOW", "citation": "STCC Sore Throat §2.3" },
    { "if": "fever_temp_f > 104", "disposition": "GO_TO_ED_NOW", "citation": "STCC Sore Throat §2.4" },
    { "if": "fever_temp_f > 101 && duration_hours > 48", "disposition": "SEE_PROVIDER_TODAY", "citation": "STCC Sore Throat §3.2" },
    { "default": "HOME_CARE_ADVICE", "citation": "STCC Sore Throat §4.1" }
  ]
}

Notice what is missing. There is no instruction to "be helpful," no fallback to "use your best judgment." Every disposition cites a section of the protocol. If the patient calls back tomorrow asking why they were told to wait, you can pull the citation and show them the published clinical guideline.

The Protocol-Walker Is a Tool, Not a Prompt

Now make the agent walk this tree. The LLM's job becomes "ask the patient the next question the protocol needs." The protocol's job is to evaluate the answers and emit either next_question or disposition. This separation is everything.

lib/walk-protocol.ts·typescript

type Protocol = { required_fields: Field[]; rules: Rule[] };
type WalkResult =
  | { status: "incomplete"; next_question: string; field: string }
  | { status: "complete"; disposition: string; citation: string };
 
export function walkProtocol(protocol: Protocol, answers: Record<string, unknown>): WalkResult {
  // 1. Find the first required field the LLM hasn't filled in yet.
  const missing = protocol.required_fields.find((f) => !(f.key in answers));
  if (missing) {
    return {
      status: "incomplete",
      next_question: questionForField(missing),
      field: missing.key,
    };
  }
  // 2. All required fields are filled. Evaluate rules in order, first match wins.
  for (const rule of protocol.rules) {
    if ("default" in rule || evaluate(rule.if, answers)) {
      return { status: "complete", disposition: rule.disposition, citation: rule.citation };
    }
  }
  // 3. No rule matched (shouldn't happen, but never make a clinical guess).
  return { status: "complete", disposition: "TRANSFER_TO_NURSE", citation: "fallback" };
}

walkProtocol is deterministic. Same inputs, same disposition, same citation. There is no temperature, no creativity, no hallucination surface area. The LLM never sees the rules. It only sees next_question and reads it to the patient.

The Red-Flag Interrupt

The protocol-walker handles the structured path. But patients do not stick to scripts. A caller who started with "I have a sore throat" might mention three turns later that their chest hurts and their left arm is tingling. By then, the sore-throat tree has them halfway to a "home care advice" disposition.

You need a parallel monitor. Run it on every transcript turn, independent of the conversation. If it fires, it preempts the agent and forces an emergency escalation.

lib/red-flag-monitor.ts·typescript

// Adapted from NHS 111 red-flag pathway and Schmitt-Thompson emergency criteria.
const RED_FLAG_PATTERNS: { pattern: RegExp; reason: string; route: "911" | "ED_NOW" }[] = [
  { pattern: /chest pain|crushing|pressure in (?:my )?chest/i, reason: "Possible ACS", route: "911" },
  { pattern: /can'?t breathe|short of breath at rest|gasping/i, reason: "Acute respiratory", route: "911" },
  { pattern: /worst headache (?:of my life|ever)|thunderclap/i, reason: "Possible SAH", route: "911" },
  { pattern: /one side of (?:my )?face|slurred speech|can'?t move (?:my )?arm/i, reason: "Possible stroke", route: "911" },
  { pattern: /coughing up blood|vomiting blood|blood in stool/i, reason: "Active bleeding", route: "ED_NOW" },
  { pattern: /unresponsive|won'?t wake up|blue lips/i, reason: "ALOC", route: "911" },
  { pattern: /under (?:30 days|one month) old.*fever|newborn.*fever/i, reason: "Neonatal fever", route: "ED_NOW" },
];
 
export function checkRedFlags(turn: string) {
  for (const { pattern, reason, route } of RED_FLAG_PATTERNS) {
    if (pattern.test(turn)) return { triggered: true, reason, route };
  }
  return { triggered: false };
}

A few things to notice. This is regex, not an LLM. It is fast, deterministic, and explainable in a deposition. The patterns map back to NHS 111's published red-flag list and Schmitt-Thompson's emergency criteria, which means in court, you can show the source. And it is intentionally aggressive. False positives here transfer the call to a human nurse, which is fine. False negatives kill people.

When the monitor fires, the agent stops asking questions. It says, "What you described sounds serious. I'm staying on the line and calling 911 now. A nurse is also being connected." The full transcript and the red-flag trigger go to the human nurse who picks up.

Disclosure Language Is Not Optional

Now the legal layer. California AB 3030 took effect January 1, 2025. It requires that any AI-generated communication conveying patient clinical information disclose that fact prominently, verbally at the start and end of audio calls. California AB 489, signed in October 2025, goes further. It prohibits AI systems from using protected healthcare titles ("nurse," "physician," "RN") in a way that suggests licensure. The agent cannot say "I'm a nurse." It cannot say "I'm a virtual nurse." It cannot even say "I'm trained like a nurse."

Here is the disclosure script that survives both bills, plus the corporate practice of medicine doctrine that exists in some form in nearly every state:

prompts/disclosure.ts·typescript

export const OPENING_DISCLOSURE = `
Hello, you've reached the after-hours symptom line for [Practice Name].
I'm an AI assistant. I'm not a nurse and I can't diagnose you or recommend treatment.
What I can do is ask you some questions about your symptoms, and based on validated
clinical guidelines, either give you home care information or connect you with
a registered nurse. If at any point this feels urgent, say "I need a nurse" and
I'll transfer you immediately.
`;
 
export const CLOSING_DISCLOSURE = `
Just to confirm: this conversation was handled by an AI assistant following
[Schmitt-Thompson sore throat] guidelines. I am not a licensed clinician.
A summary will be sent to your provider's nurse for review by the next business day.
`;

The wording is deliberate. "I'm not a nurse" is a direct compliance hedge against AB 489. "I can't diagnose or recommend treatment" addresses corporate practice of medicine concerns. Oregon SB 951 (2025), California SB 351 (2025), and Washington's pending 2026 legislation all sharpen restrictions on AI making clinical decisions. "Validated clinical guidelines" makes the audit log defensible. "A nurse will review" closes the FDA SaMD edge: it puts a human in the loop for the disposition.

The Handoff Is a Warm Transfer With Context

When the agent escalates (red-flag, patient request, protocol fallback, or just "this caller is stressed"), the human nurse should not have to start over. That is the actual value proposition: the nurse picks up with the protocol already half-walked.

lib/escalate.ts·typescript

type EscalationContext = {
  callerId: string;
  transcript: { role: "agent" | "patient"; text: string; ts: number }[];
  protocolFields: Record<string, unknown>;
  protocolId: string;
  redFlag?: { reason: string; turn: string };
};
 
async function escalateToNurse(ctx: EscalationContext) {
  // Push full context to the nurse's queue before the call rings.
  await nurseQueue.enqueue({
    callerId: ctx.callerId,
    summary: `Patient ${ctx.callerId}, protocol ${ctx.protocolId} in progress. ` +
             `Filled: ${Object.keys(ctx.protocolFields).join(", ")}. ` +
             (ctx.redFlag ? `RED FLAG: ${ctx.redFlag.reason} ("${ctx.redFlag.turn}")` : ""),
    transcript: ctx.transcript,
    protocolFields: ctx.protocolFields,
    handoffTime: Date.now(),
  });
  await voiceTransport.warmTransfer({ to: NURSE_QUEUE_NUMBER, context: ctx.callerId });
}

This is also where the malpractice defense crystallizes. The audit log shows the AI followed Schmitt-Thompson sections X, Y, and Z, escalated when the protocol said to escalate or when the red-flag monitor fired, and handed off to a licensed nurse with full context. When General Counsel asks "show me the standard of care this call followed," you can, by section number.

Where Does the Agent Stop and the Nurse Start?

This is the architectural line that matters. The agent collects symptoms and walks the tree. The nurse renders the disposition. Specifically:

What the AI agent does	What ONLY a licensed nurse does
Reads the next protocol question	Confirms the disposition
Logs answers into structured fields	Adjusts based on patient history (EHR)
Detects red-flag phrases	Calls the on-call physician
Cites the protocol section that drove escalation	Documents the encounter for the chart
Reads disclosure language verbatim	Issues prescriptions or referrals
Connects the call to a nurse	Tells the patient what to do next

This is not a hard line technologically. The LLM is perfectly capable of doing the right column. It is a hard line legally. Every state's medical practice act says diagnosis and treatment are reserved for licensed clinicians. Corporate practice of medicine doctrines (Oregon SB 951, California SB 351 in 2025, Washington's 2026 bill) prohibit non-physician-owned entities from controlling clinical decisions. The FDA's January 2025 SaMD draft guidance puts disposition-grade outputs under medical device review. None of these laws care that your AI is "really good." They care who is licensed.

The Chanl Integration

Everything above is buildable on raw OpenAI plus a Twilio account. What gets painful at scale is the operations layer: protocol updates, regression testing for red-flag detection, audit log retention, and proving in court that today's agent behaves the same as last quarter's agent. That is the part Chanl is built for: AI agents that remember each customer.

Load the protocol as a knowledge base entry rather than a hardcoded JSON file. Updates propagate without redeploys, and the version is captured in the audit log:

lib/load-protocols.ts·typescript

import { Chanl } from "@chanl/sdk";
const sdk = new Chanl({ apiKey: process.env.CHANL_API_KEY });
 
await sdk.knowledge.create({
  title: "Schmitt-Thompson Sore Throat (v2025-04)",
  source: "text",
  content: JSON.stringify(soreThroatProtocol),
  metadata: { version: "2025-04", chiefComplaint: "sore-throat" },
});

Then expose walkProtocol as a deterministic tool the agent calls every turn. Not an LLM call. A deterministic function:

lib/agent-tools.ts·typescript

await sdk.tools.create({
  name: "walk_protocol",
  description: "Returns next_question or final disposition with citation. Call on every patient turn.",
  type: "http",
  inputSchema: {
    chiefComplaint: { type: "string" },
    answers: { type: "object" },
  },
  configuration: {
    type: "http",
    method: "POST",
    url: "https://triage.example.com/walk",
  },
});

Now the malpractice defense. Build a scorecard with the four axes a hospital General Counsel will ask about (protocol followed, red flag caught, disclosure made, escalation correct) and run it against every call:

lib/scorecards.ts·typescript

// Evaluate a single call against a configured scorecard.
await sdk.scorecard.evaluate(callId, { scorecardId: "triage-defense-v1" });

A single failing call out of 10,000 surfaces in the dashboard. You can dig into the transcript, the cited protocol section, and whether the red-flag monitor fired. That is your evidence trail.

Then test the dangerous cases on every deploy. Build a battery of red-flag scenarios (chest pain, stroke signs, infant under 30 days with fever, anaphylaxis) authored against the triage agent, then run the whole batch on every release with a minimum passing score:

lib/regression-suite.ts·typescript

const result = await sdk.scenarios.runAll({
  agentId: "agent_triage_v3",
  minScore: 90,
  parallel: 3,
});
 
if (!result.allPassed) throw new Error(`Red-flag regression: ${result.failed} failed`);

If a red-flag scenario regresses, the deploy fails. This is the closest thing to a unit test for clinical safety that exists.

One Chanl feature you should not use here, deliberately: agent memory for clinical content. sdk.memory.create is great for "this caller prefers Spanish" or "this caller has accessibility needs." It's not the right place for chronic conditions, medications, or allergies. Clinical history belongs in the EHR, accessed via FHIR with proper consent (we walked through that pattern in the appointment scheduling agent build), audit-logged, and scoped to the current encounter. Memory in the agent layer creates a shadow medical record, which is a HIPAA and corporate practice of medicine problem you don't want. The wider HIPAA implications for any voice agent (BAAs, encryption, access logs) are covered in the HIPAA lessons piece.

What Does a "Good" Deployment Actually Look Like?

The architecture above is not faster nurses. It is safer escalation. Industry research suggests protocol-driven triage cuts unnecessary ED referrals by 15-25%, with average call savings around $84 per call when telephone triage is run well. That math works whether a nurse or an agent walks the protocol. The reason to put the agent in front is volume: the nurse line that used to drop calls at 3 AM can now answer all of them, walk the tree on the simple cases, and hand the hard ones to a nurse with full context.

The architecture is also auditable in a way human-only nurse lines often are not. Every disposition cites a protocol section. Every red-flag trigger is logged. Every disclosure is captured in the transcript. Three years later, when a chart gets pulled in discovery, the answer to "what standard of care did you follow at 2:47 AM on a Tuesday" is a specific answer with a specific citation.

That is the whole game. You are not replacing the nurse. You are giving the nurse a triage assistant that never freelances, always cites, and escalates aggressively. The agent makes the line cheaper to run and the nurse's job easier. It doesn't practice medicine. That's the only architecture General Counsel will sign.

Build triage agents you can actually defend

Chanl gives you the protocol-as-tool architecture, scorecards as a malpractice defense, and red-flag regression tests on every deploy. AI agents that remember each customer, without crossing into clinical decision-making.

See how Chanl handles regulated agents

Sources & References

Key Takeaway

Testing edge cases before production deployment can reduce customer complaints by 80% and prevent costly emergency fixes post-launch.

healthcare compliance hipaa voice tools scorecards

Dean Grover

Co-founder

Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.

The Signal Briefing

Un email por semana. Cómo los equipos líderes de CS, ingresos e IA están convirtiendo conversaciones en decisiones. Benchmarks, playbooks y lo que funciona en producción.