Skip to content

Streaming Prescription Extraction

Streaming Prescription Extraction

DELPHOS can listen to a doctor speaking naturally — or read free text typed in a chat — and produce a fully structured, safety-checked prescription in real time. As the doctor dictates, medications appear one by one. Each medication passes through advisory safety gates before the final prescription is assembled, ready for the doctor to review, tweak, and print BEFORE the patient leaves the room.

This is the streaming prescription agent — the most powerful clinical integration point in the DELPHOS platform.


Two Call Shapes — Pick the One That Matches Your Flow

The endpoint supports two backwards-compatible call shapes. Both run the same LLM extraction + 6-gate safety pipeline. The only difference is whether DELPHOS caches the result for repeated polls during a live consultation.

ShapeWho uses itCache layerRequired fields
Audio-progressiveMnesis, voice-driven EHRs (recommended for consultations)✅ Word-delta cache + lock + no_rx_detected short-circuitconsultation_id, patient_id, doctor_id, doctor_input (placeholder), accumulated_text
Legacy text-inputK.I.T.T. chat, scripted batch callers, single-shot processing❌ Single non-repeating callconsultation_id, patient_id, doctor_id, doctor_input

When in doubt, use the audio-progressive shape. Sending accumulated_text is purely additive — legacy callers that omit it continue to work unchanged.


How It Works — Audio-Progressive Flow

Doctor speaks Mnesis polls /v1/prescriptions/stream Your UI updates
─────────────── ────► ──────────────────────────────────── ────► ──────────────────
"Amoxicilina 1. Send growing accumulated_text Item 1 appears with
500mg oral (every audio chunk, ~5s cadence) inline gates 1, 2, 6
8/8h por 2. DELPHOS checks word-delta + cache
7 dias. 3. < 30 words new + no rx vocab → status Cross-item gates
Dipirona no_rx_detected (no LLM, P95 < 100ms) 3, 4 land
500mg SOS" 4. ≥ 30 words new OR vocab present →
LLM + 6-gate pipeline fires Final prescription
5. Cached for 24h — repeat polls hit cache (requires_confirmation)
6. Doctor reviews → POST /v1/prescriptions Doctor signs → print
to finalize BEFORE patient leaves

The streaming architecture uses a two-phase gate model:

  • Per-item gates (1, 2, 6) fire immediately as each medication is detected. Results arrive inline with each item_detected event.
  • Cross-item gates (3, 4) fire after the full input is processed, since they need to analyze interactions between all medications. Results arrive in the gates_complete event.

Sequence Diagram

Mnesis client DELPHOS API Pipeline
│ │ │
│ POST /v1/prescriptions/stream │ │
│ { ..., accumulated_text: ... } │ │
│ ─────────────────────────────► │ │
│ │ Word-delta check │
│ │ no_rx_detected scan │
│ │ Redis cache lookup │
│ │ │
│ event: status │ │
│ data: {"type":"analyzing"} │ │
│ ◄───────────────────────────── │ │
│ │ Gate 1 (Input Validation) │
│ │ Gate 2 (CMED Resolution) │
│ │ Gate 5 (Controlled Subst.)│
│ event: item_detected │ ◄────────────────────────── │
│ data: {index:0, item, gates} │ │
│ ◄───────────────────────────── │ │
│ │ Per-item gates repeat │
│ event: item_detected │ for each medication │
│ data: {index:1, item, gates} │ ◄────────────────────────── │
│ ◄───────────────────────────── │ │
│ │ Gate 3 (Drug Interactions)│
│ │ Gate 4 (Duplicate Therapy)│
│ event: gates_complete │ ◄────────────────────────── │
│ data: {gate3, gate4 results} │ │
│ ◄───────────────────────────── │ │
│ │ Persist to Redis (24h TTL)│
│ event: prescription │ │
│ data: {items, gates, final} │ │
│ ◄───────────────────────────── │ │
│ │ │
│ Doctor reviews & finalizes │ │
│ POST /v1/prescriptions │ │
│ (requires_confirmation flow) │ │
│ ─────────────────────────────► │ │

Endpoint Reference

POST /v1/prescriptions/stream

Request Headers

HeaderValueRequired
Content-Typeapplication/jsonYes
x-api-keyYour tenant API keyYes
Accepttext/event-streamRecommended

Request Body

FieldTypeRequiredDescription
consultation_idstringYesExternal consultation identifier (1–255 chars)
patient_idstringYesPatient identifier — opaque token (pat_*) or legacy UUID
doctor_idstringYesPrescribing physician identifier — opaque token (doc_*) or legacy UUID
doctor_inputstringYesRaw doctor speech or typed text (1–10,000 chars). On the audio-progressive path, pass any non-empty placeholder; the wrapper prefers accumulated_text
accumulated_textstring | nullNoAudio-progressive path marker. The full growing transcript (max 50,000 chars). When present, dispatches through the cache-aware wrapper. When absent, uses the legacy text-input path (no cache)
previous_rx_hashstring | nullNoClient-side cache identity hash from a prior poll. When matched, permits a cache hit even below the word-delta threshold. When mismatched, forces a refresh
streambooleanNotrue (default) returns an SSE stream. false returns a single JSON response

Identifier Tokens

patient_id and doctor_id accept either the legacy DELPHOS UUID (planned for removal in TSID-008) or the tenant-scoped opaque tokens (pat_* and doc_*). New integrations SHOULD use the opaque tokens — they don’t leak internal IDs across tenants. See the API reference for details.

{
"consultation_id": "ATD-2026-001234",
"patient_id": "pat_AaBbCcDdEeFfGgHhIiJj11",
"doctor_id": "doc_KkLlMmNnOoPpQqRrSsTt22",
"doctor_input": "placeholder",
"accumulated_text": "Paciente relata cefaleia tensional ha 3 dias. Sem nausea, sem febre. PA 120/80. Vou prescrever Dipirona 500mg via oral 6/6h se dor por 5 dias. Tambem Paracetamol 750mg como alternativa.",
"previous_rx_hash": null,
"stream": true
}

Example — Legacy Text-Input Request

{
"consultation_id": "ATD-2026-001234",
"patient_id": "pat_AaBbCcDdEeFfGgHhIiJj11",
"doctor_id": "doc_KkLlMmNnOoPpQqRrSsTt22",
"doctor_input": "Amoxicilina 500mg via oral de 8 em 8 horas por 7 dias. Dipirona 500mg via oral se dor, maximo 6 em 6 horas.",
"stream": true
}

SSE Event Types

Three event sequences are emitted depending on whether the call is a fresh LLM run, a cache hit, or a no-prescription transcript.

Sequence A — Cache Miss / Fresh LLM Run

status (analyzing) → item_detected (×N) → gates_complete → prescription

Sequence B — Cache Hit

status (cache_hit) → prescription

Target latency: P95 < 50 ms (no LLM, no gates, no DB).

Sequence C — No Prescription Detected

status (no_rx_detected)

Target latency: P95 < 100 ms on 50k-character transcripts. This is the dominant case during early consultation chunks (small talk, anamnesis without a prescription mention). Render this as a “no prescription so far” state rather than a blank panel — subsequent polls will fire the LLM naturally once Rx vocabulary appears in the transcript.

1. status — Processing Status

Always the first event emitted. Tells your client which sequence is about to play out.

event: status
data: {"type": "analyzing"}
event: status
data: {"type": "cache_hit"}
event: status
data: {"type": "no_rx_detected", "message": "Sem sinais de prescricao no texto acumulado; nenhuma chamada ao LLM realizada."}
TypeMeaningNext event
analyzingDELPHOS is running the LLM + 6-gate pipelineitem_detected
cache_hitCached result is being replayedprescription
no_rx_detectedTranscript contains no Rx vocabulary anchors — LLM skipped(stream closes)

2. item_detected — Medication Found

Emitted once per medication parsed from the doctor’s input. Each event includes the extracted item data and the results of the per-item safety gates (gates 1, 2, 6). The pending_gates array lists the cross-item gates that will arrive later in gates_complete.

event: item_detected
data: {
"index": 0,
"item": {
"medication_name": "Dipirona",
"dosage": "500mg",
"route": "oral",
"frequency": "6/6h",
"duration": "5 dias",
"quantity": 20,
"unit": "comprimidos",
"instructions": "se dor"
},
"gates": {
"gate1_input_validation": {
"status": "passed",
"message": "Dados do medicamento válidos."
},
"gate2_cmed_resolution": {
"status": "passed",
"severity": "info",
"message": "Medicamento identificado na base CMED: DIPIRONA SODICA 500MG COM CT BL AL PLAS PVDC X 20 (similaridade: 97%)",
"details": {
"match_type": "auto",
"similarity": 0.97,
"produto": "DIPIRONA SODICA 500MG COM CT BL AL PLAS PVDC X 20"
}
},
"gate5_controlled_substance": {
"status": "passed",
"message": "Medicamento não é substância controlada."
}
},
"pending_gates": [
"gate3_drug_interactions",
"gate4_duplicate_therapy"
]
}

Item Fields

FieldTypeDescription
medication_namestringName of the medication as spoken by the doctor
dosagestring | nullDosage (e.g., "500mg", "500mg/5ml")
routestring | nullAdministration route ("oral", "IV", "IM", "SC", "sublingual", "topical")
frequencystring | nullDosing frequency (e.g., "8/8h", "1x/dia", "12/12h")
durationstring | nullTreatment duration (e.g., "7 dias", "uso contínuo")
quantityinteger | nullTotal quantity to dispense
unitstring | nullQuantity unit (e.g., "comprimidos", "mL")
instructionsstring | nullAdditional instructions (e.g., "se dor", "em jejum")

3. gates_complete — Cross-Item Analysis Done

Emitted after all items have been detected and the cross-item safety gates finish their analysis across the full prescription.

Each gate result includes top-level gate_name, status, severity, message, and details keys (produced by GateResult.to_dict()). The details shape is gate-specific — documented per gate below.

event: gates_complete
data: {
"gate3_drug_interactions": {
"gate_name": "drug_interactions",
"status": "passed",
"severity": "info",
"message": "Nenhuma interação medicamentosa detectada.",
"details": {
"interactions": []
}
},
"gate4_duplicate_therapy": {
"gate_name": "duplicate_therapy",
"status": "passed",
"severity": "info",
"message": "Nenhuma terapia duplicada encontrada.",
"details": {
"matched_items": []
}
}
}

When interactions are found, Gate 3 populates details.interactions[] with one enriched entry per interaction. Each entry exposes the pharmacological metadata needed for client-side display and audit logging:

event: gates_complete
data: {
"gate3_drug_interactions": {
"gate_name": "drug_interactions",
"status": "passed",
"severity": "major",
"message": "⚠️ Interação medicamentosa maior detectada entre Varfarina e Aspirina: aumento do risco de sangramento. Monitorar INR e sinais de sangramento.",
"details": {
"interactions": [
{
"drug_a": "Varfarina",
"drug_b": "Aspirina",
"severity": "major",
"mechanism": "Inibição plaquetária aditiva à anticoagulação",
"clinical_effect": "Aumento do risco de sangramento",
"recommendation": "Monitorar INR e sinais de sangramento",
"extraction_method": "matrix_tier1"
}
]
}
},
"gate4_duplicate_therapy": {
"gate_name": "duplicate_therapy",
"status": "passed",
"severity": "info",
"message": "Nenhuma terapia duplicada encontrada.",
"details": { "matched_items": [] }
}
}

Per-interaction fields inside details.interactions[]:

FieldTypeDescription
drug_astringFirst interacting substance (active ingredient).
drug_bstringSecond interacting substance (active ingredient).
severitystringOne of critical, major, moderate, minor (see Severity Levels).
mechanismstring | nullPharmacological mechanism of the interaction.
clinical_effectstring | nullExpected clinical outcome.
recommendationstring | nullPrescriber guidance (e.g., monitoring, dose adjustment).
extraction_methodstring | nullHow the interaction was identified (e.g., "matrix_tier1", "llm_fallback").

When Gate 4 detects a duplicate (Level 1 or Level 2), it populates the details shape with the detection level and matched items:

// Level 1 — exact medication name or active-ingredient match
"gate4_duplicate_therapy": {
"gate_name": "duplicate_therapy",
"status": "passed",
"severity": "high",
"message": "⚠️ DUPLICIDADE DE MEDICAMENTO: 'Dipirona 500mg' já está ativo na prescrição deste paciente...",
"details": {
"level": 1,
"matched_items": [
{ "medication_name": "Dipirona 500mg", "active_ingredient": "Dipirona sódica" }
]
}
}
// Level 2 — same EPhMRA therapeutic class
"gate4_duplicate_therapy": {
"gate_name": "duplicate_therapy",
"status": "passed",
"severity": "medium",
"message": "📋 MESMA CLASSE TERAPÊUTICA: 'Ibuprofeno' pertence à mesma classe terapêutica (ANTI-INFLAMATÓRIOS NÃO-ESTERÓIDES) que 'Naproxeno', já prescrito...",
"details": {
"level": 2,
"ephmra_code": "M1A",
"class_description": "ANTI-INFLAMATÓRIOS NÃO-ESTERÓIDES",
"matched_items": [
{ "medication_name": "Naproxeno", "active_ingredient": "Naproxeno sódico", "classe_terapeutica": "M1A - ANTI-INFLAMATÓRIOS NÃO-ESTERÓIDES" }
]
}
}

Duplicate therapy reports two detection levels:

LevelDetectorExample
Level 1Identical active ingredient”Dipirona 500mg” + “Dipirona 1g”
Level 2Same EPhMRA pharmacological classTwo different NSAIDs prescribed together

4. prescription — Final Result

The complete, aggregated prescription with all items and gate results. This is the terminal success event — the stream closes after it.

event: prescription
data: {
"items": [
{
"medication_name": "Dipirona",
"dosage": "500mg",
"route": "oral",
"frequency": "6/6h",
"duration": "5 dias",
"quantity": 20,
"unit": "comprimidos",
"instructions": "se dor"
}
],
"gates_per_item": [
{
"gate1_input_validation": { "status": "passed" },
"gate2_cmed_resolution": { "status": "passed", "details": { "match_tier": "auto" } },
"gate5_controlled_substance": { "status": "passed" }
}
],
"gates_cross_item": {
"gate3_drug_interactions": { "status": "passed" },
"gate4_duplicate_therapy": { "status": "passed" }
},
"requires_confirmation": true,
"is_degraded": false
}
FieldTypeDescription
itemsarrayAll extracted medication items
gates_per_itemarrayPer-item gate results (gates 1, 2, 6) indexed by item position
gates_cross_itemobjectCross-item gate results (gates 3, 4)
requires_confirmationbooleanAlways true — physician MUST confirm via POST /v1/prescriptions before finalization
is_degradedbooleantrue if a safety gate or upstream service encountered an error and partial results are returned

5. error — Processing Failed

Emitted when an unrecoverable error occurs. The degraded flag indicates whether partial results may still be usable.

event: error
data: {
"code": "LLM_TIMEOUT",
"message": "Tempo limite excedido na comunicação com o modelo de linguagem",
"degraded": true
}
Error CodeDescriptionRetry?
LLM_TIMEOUTThe LLM did not respond within the time limitYes, with backoff
LLM_ERRORThe LLM returned a non-2xx HTTP status (transient upstream failure)Yes, after short delay
PARSE_ERRORCould not extract medications from input — retry once, then rephraseYes, once
EXTRACTION_VALIDATION_ERRORThe LLM produced structured items that failed Pydantic validation against PrescriptionItemExtracted (typically a required field returned as null or with the wrong type). Per-field detail is available in details[] (see below).No — surfaces a structured-but-incomplete extraction; rephrase or request re-extraction with cleaner input
INTERNAL_ERRORUnexpected server errorYes, with backoff

The EXTRACTION_VALIDATION_ERROR event additionally carries a details array describing each Pydantic validation failure:

event: error
data: {
"code": "EXTRACTION_VALIDATION_ERROR",
"message": "Item extraído não passou na validação estrutural",
"degraded": true,
"details": [
{
"field": "items.0.medication_name",
"reason": "Field required",
"type": "missing"
},
{
"field": "items.0.dosage",
"reason": "Input should be a valid string",
"type": "string_type"
}
]
}
Field (per entry)TypeDescription
fieldstringDotted path to the offending field inside the LLM-extracted payload.
reasonstringHuman-readable Pydantic error message (locale-controlled — see warning above).
typestringPydantic error-type token (e.g., "missing", "string_type", "int_parsing") — stable across locales; use this for programmatic branching.

The 6-Gate Safety Pipeline

Every prescription passes through the same advisory safety gates. DELPHOS follows the physician autonomy principle — all gates are advisory. They inform and warn, but never block the physician’s clinical decision.

GateName (canonical)ScopeWhat It ChecksBlocks?
1Validação de EntradaPer-itemRequired fields are present (medication_name mandatory)Only blocking gate
2Resolução de MedicamentoPer-itemMatches medication against the ANVISA/CMED national databaseAdvisory
3Interações MedicamentosasCross-itemDrug-drug interactions across all prescription itemsAdvisory
4Duplicidade TerapêuticaCross-itemOverlapping active ingredients (Level 1) and EPhMRA classes (Level 2)Advisory
5Substâncias ControladasPer-itemFlags ANVISA controlled substances (Portaria 344/98, RDC 20/2011)Advisory
6Cruzamento de AlergiasPer-item(v1.1) Cross-references prescription items against the patient’s allergy listAdvisory (when shipped)

Gate 1 — Input Validation

The only blocking gate. Verifies that medication_name is present. If it fails, the item is rejected — no point running expensive downstream gates on a malformed item.

Gate 2 — CMED Resolution (ANVISA / CMED)

Matches the medication against the ANVISA CMED price/registry database. The match tier informs your UI how confident the resolution is:

TierMeaningUI treatment
autoHigh-confidence matchDisplay CMED product name; no action required
suggestionModerate confidenceShow DELPHOS’s suggested CMED product; ask doctor to confirm
noneNo matchRender as “Off-CMED prescription” — still valid, just unmatched

CMED resolution is advisory because off-CMED prescriptions are entirely legal — compounded medications, imported drugs, and brand-new approvals are routinely off-registry.

Gate 3 — Drug-Drug Interactions (cross-item)

Checks pairwise interactions across all items in the prescription. Returns pairs_checked, interactions_found, and a detailed interactions[] array with severity (info / warning / moderate / major / critical) per pair.

Gate 4 — Duplicate Therapy (cross-item)

Two-level detection:

  • Level 1: identical active ingredient (e.g., two formulations of Dipirona).
  • Level 2: same EPhMRA pharmacological class (e.g., two NSAIDs).

Gate 5 — Substâncias Controladas

Flags substances controlled under ANVISA Portaria 344/98 and RDC 20/2011. Returns the controlled-substance category (e.g., A1, A2, B1, C1) when applicable, so your UI can warn the doctor about the special prescription format required (yellow form, blue form, retention copy).

Severity Levels

When any gate produces a warning, it includes a severity level. Two distinct severity vocabularies are in use across the gates — be sure to handle both when rendering UI:

Gate 3 (Drug Interactions) — SeverityLevel enum:

SeverityMeaningRecommended UI treatment
infoInformational — no concernsSubtle indicator
warningAdvisory alert — physician should reviewYellow highlight
moderateModerate concern — review recommendedOrange highlight
majorSignificant concern — careful review neededRed highlight with details
criticalSerious safety concern — demands attentionProminent red alert

Gate 4 (Duplicate Therapy) — custom strings (not the SeverityLevel enum):

SeverityMeaningEmitted byRecommended UI treatment
highLevel 1 — exact medication name or active-ingredient duplicateGate 4Red highlight; “confirm duplicate?” prompt
mediumLevel 2 — same EPhMRA therapeutic class (e.g., two NSAIDs)Gate 4Orange highlight; “intentional?” prompt

Other gates (CMED resolution, Controlled Substance) reuse values from the SeverityLevel enum above — typically info for clean cases and warning when results are advisory but degraded (e.g., CMED service unavailable).


Cache Semantics — Audio-Progressive Path

When accumulated_text is present, the endpoint routes through a cache-aware wrapper that mirrors the Progressive SOAP behaviour.

Word-Delta Threshold

AspectValue
Threshold30 new words since the last cached result
EffectBelow threshold → cache hit (replay cached result). At or above → LLM + gate pipeline re-fires
RationaleMirrors Progressive SOAP (ratified 2026-05-19, founder direct). Empirically reduces LLM calls ~6× without losing recency

Redis Cache

AspectValue
Cache key scope(tenant_app_id, consultation_id) — strict multi-tenant isolation
TTL24 hours
Lock scope(tenant_app_id, consultation_id) — prevents thundering-herd LLM calls on concurrent polls

previous_rx_hash Identity

Each cached prescription payload is associated with a server-side hash. Pass it back as previous_rx_hash on subsequent polls:

  • If the supplied hash matches the cached hash → permits a cache hit even when the word-delta has been crossed.
  • If the supplied hash does not match → forces a refresh (regardless of word-delta), so a client that has lost state can resync.

no_rx_detected Short-Circuit

The wrapper scans the accumulated transcript for PT-BR Rx vocabulary anchors (medication names, dosage/route/frequency markers) before invoking the LLM. When no anchors are found, the wrapper short-circuits with a single status: no_rx_detected event and skips the LLM entirely. This is the dominant case in the first few minutes of a consultation and keeps idle polls below 100 ms.


Two-Phase Commit — Stream → Review → Finalize

The streaming endpoint produces a diagnostic preview with requires_confirmation: true. To finalize, the doctor reviews the streamed result in your UI, tweaks dosages or removes items, then calls:

POST /v1/prescriptions

with the confirmed items. That call persists the prescription as a draft, and the doctor advances through the lifecycle to signed (see Digital Signing).

Stream (extract + validate) → Doctor reviews + tweaks → POST /v1/prescriptions (persist + sign)
[diagnostic] [in your UI] [authoritative write]

Client Integration

Audio-Progressive Pattern — Parallel with Progressive SOAP

In a typical Mnesis consultation, you poll two streaming endpoints in parallel as the audio transcript grows: progressive-soap/stream for the clinical note and prescriptions/stream for the prescription. Both share the same consultation_id and the same accumulated_text cadence (one poll per audio chunk, ~5 seconds of audio).

/**
* Audio-progressive prescription stream — Mnesis pattern.
* Polls /v1/prescriptions/stream alongside /v1/consultation/progressive-soap/stream
* as the transcript accumulates during a live consultation.
*/
interface StreamingPrescriptionRequest {
consultation_id: string;
patient_id: string;
doctor_id: string;
doctor_input: string; // placeholder on audio path
accumulated_text: string; // the growing transcript
previous_rx_hash: string | null; // client-cached identity, null on first poll
stream: true;
}
interface StreamingPrescriptionResponse {
items: Array<{
medication_name: string;
dosage: string | null;
route: string | null;
frequency: string | null;
duration: string | null;
quantity: number | null;
unit: string | null;
instructions: string | null;
}>;
gates_per_item: Array<Record<string, unknown>>;
gates_cross_item: Record<string, unknown>;
requires_confirmation: true;
is_degraded: boolean;
}
async function streamPrescription(
apiKey: string,
body: StreamingPrescriptionRequest,
onEvent: (type: string, data: any) => void,
): Promise<void> {
const response = await fetch('/v1/prescriptions/stream', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-api-key': apiKey,
'Accept': 'text/event-stream',
},
body: JSON.stringify(body),
});
if (!response.ok) {
throw new Error(`HTTP ${response.status}: ${response.statusText}`);
}
const reader = response.body!.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const blocks = buffer.split('\n\n');
buffer = blocks.pop()!; // keep incomplete block
for (const block of blocks) {
if (!block.trim()) continue;
const evtMatch = block.match(/^event:\s*(.+)$/m);
const dataMatch = block.match(/^data:\s*(.+)$/m);
if (evtMatch && dataMatch) {
onEvent(evtMatch[1].trim(), JSON.parse(dataMatch[1]));
}
}
}
}
// ── Usage — polled per audio chunk ──────────────────────
let previousRxHash: string | null = null;
let accumulatedText = '';
// Called whenever a new transcribed chunk lands
async function onTranscriptChunk(newSegmentText: string) {
accumulatedText += ' ' + newSegmentText;
await streamPrescription(
'YOUR_TENANT_API_KEY',
{
consultation_id: 'ATD-2026-001234',
patient_id: 'pat_AaBbCcDdEeFfGgHhIiJj11',
doctor_id: 'doc_KkLlMmNnOoPpQqRrSsTt22',
doctor_input: 'placeholder', // audio path
accumulated_text: accumulatedText, // the FULL growing transcript
previous_rx_hash: previousRxHash,
stream: true,
},
(type, data) => {
switch (type) {
case 'status':
if (data.type === 'no_rx_detected') {
showRxPanel('No prescription dictated yet');
} else if (data.type === 'cache_hit') {
// about to receive a cached prescription event
}
break;
case 'item_detected':
renderItem(data.index, data.item, data.gates);
break;
case 'gates_complete':
renderCrossItemGates(data);
break;
case 'prescription':
renderFinalPrescription(data);
previousRxHash = computeClientHash(data); // your hash strategy
break;
case 'error':
showError(data.code, data.message, data.degraded);
break;
}
},
);
}

Non-Streaming Fallback

Set "stream": false to receive a single JSON response with all items and gate results at once. The response structure is identical to the prescription SSE event payload.

Terminal window
curl -X POST 'https://your-instance.delphos.app/v1/prescriptions/stream' \
-H 'Content-Type: application/json' \
-H 'x-api-key: YOUR_TENANT_API_KEY' \
-d '{
"consultation_id": "ATD-2026-001234",
"patient_id": "pat_AaBbCcDdEeFfGgHhIiJj11",
"doctor_id": "doc_KkLlMmNnOoPpQqRrSsTt22",
"doctor_input": "Dipirona 500mg via oral 6/6h se dor por 5 dias.",
"stream": false
}'

Error Handling Mid-Stream

When an error occurs during streaming, DELPHOS emits an error event and closes the connection. Your client should implement retry logic with exponential backoff, falling back to the non-streaming endpoint after max retries.

async function streamWithRetry(
apiKey: string,
body: StreamingPrescriptionRequest,
onEvent: (type: string, data: any) => void,
maxRetries = 3,
): Promise<void> {
let attempt = 0;
// Reason: `EXTRACTION_VALIDATION_ERROR` is intentionally NOT retriable —
// structured-but-incomplete extractions surface a content problem, not a
// transient infrastructure failure. Branch on the `code` field, never on
// the `message` (which is locale-controlled).
const retriable = new Set(['PARSE_ERROR', 'LLM_TIMEOUT', 'LLM_ERROR']);
while (attempt < maxRetries) {
try {
await streamPrescription(apiKey, body, (type, data) => {
if (type === 'error' && !retriable.has(data.code)) {
throw new Error(data.code);
}
onEvent(type, data);
});
return; // success
} catch (err) {
attempt++;
if (attempt >= maxRetries) {
// Fall back to non-streaming endpoint
const response = await fetch('/v1/prescriptions/stream', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-api-key': apiKey,
},
body: JSON.stringify({ ...body, stream: false }),
});
const data = await response.json();
onEvent('prescription', data);
return;
}
// Exponential backoff: 1s, 2s, 4s
await new Promise(r => setTimeout(r, 1000 * 2 ** (attempt - 1)));
}
}
}

FAQ

Should I poll the streaming endpoint on every audio chunk? Yes. The word-delta threshold + Redis cache absorb the overhead. Below 30 new words, polls return from cache in P95 < 50 ms. The wrapper exists specifically to make per-chunk polling cheap.

What’s doctor_input for on the audio-progressive path? It’s a required field by schema — pass any non-empty placeholder (e.g., "placeholder"). The wrapper prefers accumulated_text when present. This is a legacy artefact of the original text-input contract and is preserved for backwards-compat.

How do I compute previous_rx_hash? Use any deterministic hash of the previous prescription payload (e.g., SHA-256 of JSON.stringify({items, gates_cross_item})). Pass it back on the next poll. When it matches the server’s cached hash, you get a cache hit even past the word-delta threshold.

Why is gate 6 missing from v1.0 responses? Gate 6 (Cruzamento de Alergias) is the v1.1 allergy-cross-reference feature and is not implemented in the current v1.0 release. See DELPHOS #964 for the implementation ticket. The canonical 6-gate numbering was ratified by DELPHOS #959 on 2026-05-19, and aligns with the public spec on delphosai.io.

Why are gates advisory and not blocking? Per Lei 12.842/2013 (Brazilian Physician Act) and the CFM Code of Ethics (Articles 20-21), the physician has exclusive authority over clinical decisions. DELPHOS provides decision support, not clinical gating. Even critical interactions are surfaced as warnings, never hard blocks.

What happens if the LLM is slow or fails? The endpoint emits an error event with degraded: true and any partial results gathered so far. Your client should retry with exponential backoff, then fall back to the non-streaming variant.

Can I use this without recorded audio? Yes — the legacy text-input path (doctor_input only, no accumulated_text) is the original K.I.T.T. chat integration and remains fully supported. It just doesn’t cache (single-shot calls don’t benefit from caching).

Is the streamed prescription persisted automatically? No. The streaming endpoint is diagnostic — requires_confirmation: true indicates the doctor must explicitly call POST /v1/prescriptions to persist. This two-phase commit is intentional.