It's 2:47 AM. A parent has a feverish toddler. They call the after-hours nurse line.
That call costs the health system somewhere between $15 and $30 to handle, depending on whether it goes to an in-house team or an outsourced vendor. Schmitt-Thompson, the protocol library used by 95% of North American medical triage call centers, processes over 25 million of these calls a year. The volume is enormous, the margin is thin, and the obvious move is to put an AI agent in front of the nurse.
But there is a question every digital health builder hits the moment they sketch the architecture: what happens when the agent says "this can wait until morning" and the patient codes overnight? Who gets sued? The agent vendor? The health system? The on-call physician whose name is on the answering service?
The answer is "yes, all of them, and you'll spend two years finding out which one writes the check." So the architecture has to do something subtle. It has to use the agent for what agents are good at (collecting symptoms, asking the next question, never getting tired at 2:47 AM) and ruthlessly keep the agent away from anything that looks like clinical judgment. That line is where this article lives.
Don't Let the LLM Freelance
The first instinct most teams have is to write a system prompt that says something like "You are a nurse triage assistant. Ask the patient about their symptoms and recommend the appropriate level of care." This is the wrong instinct. There are 50 years of clinical evidence built into telephone triage protocols. Schmitt-Thompson is updated annually against AAP, ACEP, ACOG, AHA, and CDC guidelines, peer-reviewed by 200+ active clinicians. Manchester Triage is the UK and European equivalent, used by NHS ambulance services since 2006 for the secondary assessment of non-immediate 999 calls.
These protocols are decision trees. Each chief complaint (sore throat, chest pain, infant fever, abdominal pain) has a sequence of required questions that branch into red, yellow, or green dispositions. The protocol is not "advice." It is the standard of care.
Load the protocol as structured data, then make the LLM's job to fill in the tree's required fields. Not invent a recommendation.
{
"id": "schmitt-thompson-sore-throat-v2025-04",
"chief_complaint": "Sore Throat",
"required_fields": [
{ "key": "age_months", "type": "number" },
{ "key": "fever_temp_f", "type": "number" },
{ "key": "duration_hours", "type": "number" },
{ "key": "drooling", "type": "boolean" },
{ "key": "trouble_swallowing", "type": "boolean" },
{ "key": "trouble_breathing", "type": "boolean" }
],
"rules": [
{ "if": "trouble_breathing == true", "disposition": "CALL_911", "citation": "STCC Sore Throat §1.1" },
{ "if": "drooling == true && age_months >= 12", "disposition": "GO_TO_ED_NOW", "citation": "STCC Sore Throat §2.3" },
{ "if": "fever_temp_f > 104", "disposition": "GO_TO_ED_NOW", "citation": "STCC Sore Throat §2.4" },
{ "if": "fever_temp_f > 101 && duration_hours > 48", "disposition": "SEE_PROVIDER_TODAY", "citation": "STCC Sore Throat §3.2" },
{ "default": "HOME_CARE_ADVICE", "citation": "STCC Sore Throat §4.1" }
]
}Notice what is missing. There is no instruction to "be helpful," no fallback to "use your best judgment." Every disposition cites a section of the protocol. If the patient calls back tomorrow asking why they were told to wait, you can pull the citation and show them the published clinical guideline.
The Protocol-Walker Is a Tool, Not a Prompt
Now make the agent walk this tree. The LLM's job becomes "ask the patient the next question the protocol needs." The protocol's job is to evaluate the answers and emit either next_question or disposition. This separation is everything.
type Protocol = { required_fields: Field[]; rules: Rule[] };
type WalkResult =
| { status: "incomplete"; next_question: string; field: string }
| { status: "complete"; disposition: string; citation: string };
export function walkProtocol(protocol: Protocol, answers: Record<string, unknown>): WalkResult {
// 1. Find the first required field the LLM hasn't filled in yet.
const missing = protocol.required_fields.find((f) => !(f.key in answers));
if (missing) {
return {
status: "incomplete",
next_question: questionForField(missing),
field: missing.key,
};
}
// 2. All required fields are filled. Evaluate rules in order, first match wins.
for (const rule of protocol.rules) {
if ("default" in rule || evaluate(rule.if, answers)) {
return { status: "complete", disposition: rule.disposition, citation: rule.citation };
}
}
// 3. No rule matched (shouldn't happen, but never make a clinical guess).
return { status: "complete", disposition: "TRANSFER_TO_NURSE", citation: "fallback" };
}walkProtocol is deterministic. Same inputs, same disposition, same citation. There is no temperature, no creativity, no hallucination surface area. The LLM never sees the rules. It only sees next_question and reads it to the patient.
The Red-Flag Interrupt
The protocol-walker handles the structured path. But patients do not stick to scripts. A caller who started with "I have a sore throat" might mention three turns later that their chest hurts and their left arm is tingling. By then, the sore-throat tree has them halfway to a "home care advice" disposition.
You need a parallel monitor. Run it on every transcript turn, independent of the conversation. If it fires, it preempts the agent and forces an emergency escalation.
// Adapted from NHS 111 red-flag pathway and Schmitt-Thompson emergency criteria.
const RED_FLAG_PATTERNS: { pattern: RegExp; reason: string; route: "911" | "ED_NOW" }[] = [
{ pattern: /chest pain|crushing|pressure in (?:my )?chest/i, reason: "Possible ACS", route: "911" },
{ pattern: /can'?t breathe|short of breath at rest|gasping/i, reason: "Acute respiratory", route: "911" },
{ pattern: /worst headache (?:of my life|ever)|thunderclap/i, reason: "Possible SAH", route: "911" },
{ pattern: /one side of (?:my )?face|slurred speech|can'?t move (?:my )?arm/i, reason: "Possible stroke", route: "911" },
{ pattern: /coughing up blood|vomiting blood|blood in stool/i, reason: "Active bleeding", route: "ED_NOW" },
{ pattern: /unresponsive|won'?t wake up|blue lips/i, reason: "ALOC", route: "911" },
{ pattern: /under (?:30 days|one month) old.*fever|newborn.*fever/i, reason: "Neonatal fever", route: "ED_NOW" },
];
export function checkRedFlags(turn: string) {
for (const { pattern, reason, route } of RED_FLAG_PATTERNS) {
if (pattern.test(turn)) return { triggered: true, reason, route };
}
return { triggered: false };
}A few things to notice. This is regex, not an LLM. It is fast, deterministic, and explainable in a deposition. The patterns map back to NHS 111's published red-flag list and Schmitt-Thompson's emergency criteria, which means in court, you can show the source. And it is intentionally aggressive. False positives here transfer the call to a human nurse, which is fine. False negatives kill people.
When the monitor fires, the agent stops asking questions. It says, "What you described sounds serious. I'm staying on the line and calling 911 now. A nurse is also being connected." The full transcript and the red-flag trigger go to the human nurse who picks up.
Disclosure Language Is Not Optional
Now the legal layer. California AB 3030 took effect January 1, 2025. It requires that any AI-generated communication conveying patient clinical information disclose that fact prominently, verbally at the start and end of audio calls. California AB 489, signed in October 2025, goes further. It prohibits AI systems from using protected healthcare titles ("nurse," "physician," "RN") in a way that suggests licensure. The agent cannot say "I'm a nurse." It cannot say "I'm a virtual nurse." It cannot even say "I'm trained like a nurse."
Here is the disclosure script that survives both bills, plus the corporate practice of medicine doctrine that exists in some form in nearly every state:
export const OPENING_DISCLOSURE = `
Hello, you've reached the after-hours symptom line for [Practice Name].
I'm an AI assistant. I'm not a nurse and I can't diagnose you or recommend treatment.
What I can do is ask you some questions about your symptoms, and based on validated
clinical guidelines, either give you home care information or connect you with
a registered nurse. If at any point this feels urgent, say "I need a nurse" and
I'll transfer you immediately.
`;
export const CLOSING_DISCLOSURE = `
Just to confirm: this conversation was handled by an AI assistant following
[Schmitt-Thompson sore throat] guidelines. I am not a licensed clinician.
A summary will be sent to your provider's nurse for review by the next business day.
`;The wording is deliberate. "I'm not a nurse" is a direct compliance hedge against AB 489. "I can't diagnose or recommend treatment" addresses corporate practice of medicine concerns. Oregon SB 951 (2025), California SB 351 (2025), and Washington's pending 2026 legislation all sharpen restrictions on AI making clinical decisions. "Validated clinical guidelines" makes the audit log defensible. "A nurse will review" closes the FDA SaMD edge: it puts a human in the loop for the disposition.
The Handoff Is a Warm Transfer With Context
When the agent escalates (red-flag, patient request, protocol fallback, or just "this caller is stressed"), the human nurse should not have to start over. That is the actual value proposition: the nurse picks up with the protocol already half-walked.
type EscalationContext = {
callerId: string;
transcript: { role: "agent" | "patient"; text: string; ts: number }[];
protocolFields: Record<string, unknown>;
protocolId: string;
redFlag?: { reason: string; turn: string };
};
async function escalateToNurse(ctx: EscalationContext) {
// Push full context to the nurse's queue before the call rings.
await nurseQueue.enqueue({
callerId: ctx.callerId,
summary: `Patient ${ctx.callerId}, protocol ${ctx.protocolId} in progress. ` +
`Filled: ${Object.keys(ctx.protocolFields).join(", ")}. ` +
(ctx.redFlag ? `RED FLAG: ${ctx.redFlag.reason} ("${ctx.redFlag.turn}")` : ""),
transcript: ctx.transcript,
protocolFields: ctx.protocolFields,
handoffTime: Date.now(),
});
await voiceTransport.warmTransfer({ to: NURSE_QUEUE_NUMBER, context: ctx.callerId });
}This is also where the malpractice defense crystallizes. The audit log shows the AI followed Schmitt-Thompson sections X, Y, and Z, escalated when the protocol said to escalate or when the red-flag monitor fired, and handed off to a licensed nurse with full context. When General Counsel asks "show me the standard of care this call followed," you can, by section number.
Where Does the Agent Stop and the Nurse Start?
This is the architectural line that matters. The agent collects symptoms and walks the tree. The nurse renders the disposition. Specifically:
| What the AI agent does | What ONLY a licensed nurse does |
|---|---|
| Reads the next protocol question | Confirms the disposition |
| Logs answers into structured fields | Adjusts based on patient history (EHR) |
| Detects red-flag phrases | Calls the on-call physician |
| Cites the protocol section that drove escalation | Documents the encounter for the chart |
| Reads disclosure language verbatim | Issues prescriptions or referrals |
| Connects the call to a nurse | Tells the patient what to do next |
This is not a hard line technologically. The LLM is perfectly capable of doing the right column. It is a hard line legally. Every state's medical practice act says diagnosis and treatment are reserved for licensed clinicians. Corporate practice of medicine doctrines (Oregon SB 951, California SB 351 in 2025, Washington's 2026 bill) prohibit non-physician-owned entities from controlling clinical decisions. The FDA's January 2025 SaMD draft guidance puts disposition-grade outputs under medical device review. None of these laws care that your AI is "really good." They care who is licensed.
The Chanl Integration
Everything above is buildable on raw OpenAI plus a Twilio account. What gets painful at scale is the operations layer: protocol updates, regression testing for red-flag detection, audit log retention, and proving in court that today's agent behaves the same as last quarter's agent. That is the part Chanl is built for: AI agents that remember each customer.
Load the protocol as a knowledge base entry rather than a hardcoded JSON file. Updates propagate without redeploys, and the version is captured in the audit log:
import { Chanl } from "@chanl/sdk";
const sdk = new Chanl({ apiKey: process.env.CHANL_API_KEY });
await sdk.knowledge.create({
title: "Schmitt-Thompson Sore Throat (v2025-04)",
source: "text",
content: JSON.stringify(soreThroatProtocol),
metadata: { version: "2025-04", chiefComplaint: "sore-throat" },
});Then expose walkProtocol as a deterministic tool the agent calls every turn. Not an LLM call. A deterministic function:
await sdk.tools.create({
name: "walk_protocol",
description: "Returns next_question or final disposition with citation. Call on every patient turn.",
type: "http",
inputSchema: {
chiefComplaint: { type: "string" },
answers: { type: "object" },
},
configuration: {
type: "http",
method: "POST",
url: "https://triage.example.com/walk",
},
});Now the malpractice defense. Build a scorecard with the four axes a hospital General Counsel will ask about (protocol followed, red flag caught, disclosure made, escalation correct) and run it against every call:
// Evaluate a single call against a configured scorecard.
await sdk.scorecard.evaluate(callId, { scorecardId: "triage-defense-v1" });A single failing call out of 10,000 surfaces in the dashboard. You can dig into the transcript, the cited protocol section, and whether the red-flag monitor fired. That is your evidence trail.
Then test the dangerous cases on every deploy. Build a battery of red-flag scenarios (chest pain, stroke signs, infant under 30 days with fever, anaphylaxis) authored against the triage agent, then run the whole batch on every release with a minimum passing score:
const result = await sdk.scenarios.runAll({
agentId: "agent_triage_v3",
minScore: 90,
parallel: 3,
});
if (!result.allPassed) throw new Error(`Red-flag regression: ${result.failed} failed`);If a red-flag scenario regresses, the deploy fails. This is the closest thing to a unit test for clinical safety that exists.
One Chanl feature you should not use here, deliberately: agent memory for clinical content. sdk.memory.create is great for "this caller prefers Spanish" or "this caller has accessibility needs." It's not the right place for chronic conditions, medications, or allergies. Clinical history belongs in the EHR, accessed via FHIR with proper consent (we walked through that pattern in the appointment scheduling agent build), audit-logged, and scoped to the current encounter. Memory in the agent layer creates a shadow medical record, which is a HIPAA and corporate practice of medicine problem you don't want. The wider HIPAA implications for any voice agent (BAAs, encryption, access logs) are covered in the HIPAA lessons piece.
What Does a "Good" Deployment Actually Look Like?
The architecture above is not faster nurses. It is safer escalation. Industry research suggests protocol-driven triage cuts unnecessary ED referrals by 15-25%, with average call savings around $84 per call when telephone triage is run well. That math works whether a nurse or an agent walks the protocol. The reason to put the agent in front is volume: the nurse line that used to drop calls at 3 AM can now answer all of them, walk the tree on the simple cases, and hand the hard ones to a nurse with full context.
The architecture is also auditable in a way human-only nurse lines often are not. Every disposition cites a protocol section. Every red-flag trigger is logged. Every disclosure is captured in the transcript. Three years later, when a chart gets pulled in discovery, the answer to "what standard of care did you follow at 2:47 AM on a Tuesday" is a specific answer with a specific citation.
That is the whole game. You are not replacing the nurse. You are giving the nurse a triage assistant that never freelances, always cites, and escalates aggressively. The agent makes the line cheaper to run and the nurse's job easier. It doesn't practice medicine. That's the only architecture General Counsel will sign.
Build triage agents you can actually defend
Chanl gives you the protocol-as-tool architecture, scorecards as a malpractice defense, and red-flag regression tests on every deploy. AI agents that remember each customer, without crossing into clinical decision-making.
See how Chanl handles regulated agents- Schmitt-Thompson Clinical Content: official guideline library
- ClearTriage: Evidence behind Schmitt-Thompson office hours protocols
- Manchester Triage Group: Telephone Triage Hub
- FDA: Artificial Intelligence in Software as a Medical Device
- FDA: AI/ML SaMD lifecycle management draft guidance, January 2025
- Medical Board of California: AB 3030 GenAI notification (effective Jan 1, 2025)
- Hintze Law: California AB 489 prohibits AI healthcare-license misrepresentation (Oct 2025)
- Wilson Sonsini: California AG legal advisory on AI in healthcare
- Nova Southeastern Law Review: Telehealth, AI, and the Corporate Practice of Medicine
- TrueEval: Corporate Practice of Medicine compliance across state lines, 2025-2026
- PMC: AI Chatbots and the challenges of HIPAA compliance
- NHS: How NHS 111 online works (red-flag escalation pathway)
- PubMed: Reproducibility of the Manchester Triage System (2025 multicentre study)
- Innolitics: 2025 Year in Review: AI/ML Medical Device 510(k) Clearances
- Schmitt-Thompson: Impact of telephone triage on healthcare costs
Co-founder
Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.
The Signal Briefing
Un email por semana. Cómo los equipos líderes de CS, ingresos e IA están convirtiendo conversaciones en decisiones. Benchmarks, playbooks y lo que funciona en producción.



