Streaming Prescription Extraction
Streaming Prescription Extraction
DELPHOS can listen to a doctor speaking naturally — or read free text typed in a chat — and produce a fully structured, safety-checked prescription in real time. As the doctor dictates, medications appear one by one. Each medication passes through advisory safety gates before the final prescription is assembled, ready for the doctor to review, tweak, and print BEFORE the patient leaves the room.
This is the streaming prescription agent — the most powerful clinical integration point in the DELPHOS platform.
Two Call Shapes — Pick the One That Matches Your Flow
The endpoint supports two backwards-compatible call shapes. Both run the same LLM extraction + 6-gate safety pipeline. The only difference is whether DELPHOS caches the result for repeated polls during a live consultation.
| Shape | Who uses it | Cache layer | Required fields |
|---|---|---|---|
| Audio-progressive | Mnesis, voice-driven EHRs (recommended for consultations) | ✅ Word-delta cache + lock + no_rx_detected short-circuit | consultation_id, patient_id, doctor_id, doctor_input (placeholder), accumulated_text |
| Legacy text-input | K.I.T.T. chat, scripted batch callers, single-shot processing | ❌ Single non-repeating call | consultation_id, patient_id, doctor_id, doctor_input |
When in doubt, use the audio-progressive shape. Sending
accumulated_textis purely additive — legacy callers that omit it continue to work unchanged.
How It Works — Audio-Progressive Flow
Doctor speaks Mnesis polls /v1/prescriptions/stream Your UI updates─────────────── ────► ──────────────────────────────────── ────► ──────────────────"Amoxicilina 1. Send growing accumulated_text Item 1 appears with 500mg oral (every audio chunk, ~5s cadence) inline gates 1, 2, 6 8/8h por 2. DELPHOS checks word-delta + cache 7 dias. 3. < 30 words new + no rx vocab → status Cross-item gates Dipirona no_rx_detected (no LLM, P95 < 100ms) 3, 4 land 500mg SOS" 4. ≥ 30 words new OR vocab present → LLM + 6-gate pipeline fires Final prescription 5. Cached for 24h — repeat polls hit cache (requires_confirmation) 6. Doctor reviews → POST /v1/prescriptions Doctor signs → print to finalize BEFORE patient leavesThe streaming architecture uses a two-phase gate model:
- Per-item gates (1, 2, 6) fire immediately as each medication is
detected. Results arrive inline with each
item_detectedevent. - Cross-item gates (3, 4) fire after the full input is processed,
since they need to analyze interactions between all medications.
Results arrive in the
gates_completeevent.
Sequence Diagram
Mnesis client DELPHOS API Pipeline │ │ │ │ POST /v1/prescriptions/stream │ │ │ { ..., accumulated_text: ... } │ │ │ ─────────────────────────────► │ │ │ │ Word-delta check │ │ │ no_rx_detected scan │ │ │ Redis cache lookup │ │ │ │ │ event: status │ │ │ data: {"type":"analyzing"} │ │ │ ◄───────────────────────────── │ │ │ │ Gate 1 (Input Validation) │ │ │ Gate 2 (CMED Resolution) │ │ │ Gate 5 (Controlled Subst.)│ │ event: item_detected │ ◄────────────────────────── │ │ data: {index:0, item, gates} │ │ │ ◄───────────────────────────── │ │ │ │ Per-item gates repeat │ │ event: item_detected │ for each medication │ │ data: {index:1, item, gates} │ ◄────────────────────────── │ │ ◄───────────────────────────── │ │ │ │ Gate 3 (Drug Interactions)│ │ │ Gate 4 (Duplicate Therapy)│ │ event: gates_complete │ ◄────────────────────────── │ │ data: {gate3, gate4 results} │ │ │ ◄───────────────────────────── │ │ │ │ Persist to Redis (24h TTL)│ │ event: prescription │ │ │ data: {items, gates, final} │ │ │ ◄───────────────────────────── │ │ │ │ │ │ Doctor reviews & finalizes │ │ │ POST /v1/prescriptions │ │ │ (requires_confirmation flow) │ │ │ ─────────────────────────────► │ │Endpoint Reference
POST /v1/prescriptions/streamRequest Headers
| Header | Value | Required |
|---|---|---|
Content-Type | application/json | Yes |
x-api-key | Your tenant API key | Yes |
Accept | text/event-stream | Recommended |
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
consultation_id | string | Yes | External consultation identifier (1–255 chars) |
patient_id | string | Yes | Patient identifier — opaque token (pat_*) or legacy UUID |
doctor_id | string | Yes | Prescribing physician identifier — opaque token (doc_*) or legacy UUID |
doctor_input | string | Yes | Raw doctor speech or typed text (1–10,000 chars). On the audio-progressive path, pass any non-empty placeholder; the wrapper prefers accumulated_text |
accumulated_text | string | null | No | Audio-progressive path marker. The full growing transcript (max 50,000 chars). When present, dispatches through the cache-aware wrapper. When absent, uses the legacy text-input path (no cache) |
previous_rx_hash | string | null | No | Client-side cache identity hash from a prior poll. When matched, permits a cache hit even below the word-delta threshold. When mismatched, forces a refresh |
stream | boolean | No | true (default) returns an SSE stream. false returns a single JSON response |
Identifier Tokens
patient_id and doctor_id accept either the legacy DELPHOS UUID (planned
for removal in TSID-008) or the tenant-scoped opaque tokens (pat_* and
doc_*). New integrations SHOULD use the opaque tokens — they don’t leak
internal IDs across tenants. See the
API reference for details.
Example — Audio-Progressive Request (recommended)
{ "consultation_id": "ATD-2026-001234", "patient_id": "pat_AaBbCcDdEeFfGgHhIiJj11", "doctor_id": "doc_KkLlMmNnOoPpQqRrSsTt22", "doctor_input": "placeholder", "accumulated_text": "Paciente relata cefaleia tensional ha 3 dias. Sem nausea, sem febre. PA 120/80. Vou prescrever Dipirona 500mg via oral 6/6h se dor por 5 dias. Tambem Paracetamol 750mg como alternativa.", "previous_rx_hash": null, "stream": true}Example — Legacy Text-Input Request
{ "consultation_id": "ATD-2026-001234", "patient_id": "pat_AaBbCcDdEeFfGgHhIiJj11", "doctor_id": "doc_KkLlMmNnOoPpQqRrSsTt22", "doctor_input": "Amoxicilina 500mg via oral de 8 em 8 horas por 7 dias. Dipirona 500mg via oral se dor, maximo 6 em 6 horas.", "stream": true}SSE Event Types
Three event sequences are emitted depending on whether the call is a fresh LLM run, a cache hit, or a no-prescription transcript.
Sequence A — Cache Miss / Fresh LLM Run
status (analyzing) → item_detected (×N) → gates_complete → prescriptionSequence B — Cache Hit
status (cache_hit) → prescriptionTarget latency: P95 < 50 ms (no LLM, no gates, no DB).
Sequence C — No Prescription Detected
status (no_rx_detected)Target latency: P95 < 100 ms on 50k-character transcripts. This is the dominant case during early consultation chunks (small talk, anamnesis without a prescription mention). Render this as a “no prescription so far” state rather than a blank panel — subsequent polls will fire the LLM naturally once Rx vocabulary appears in the transcript.
1. status — Processing Status
Always the first event emitted. Tells your client which sequence is about to play out.
event: statusdata: {"type": "analyzing"}event: statusdata: {"type": "cache_hit"}event: statusdata: {"type": "no_rx_detected", "message": "Sem sinais de prescricao no texto acumulado; nenhuma chamada ao LLM realizada."}| Type | Meaning | Next event |
|---|---|---|
analyzing | DELPHOS is running the LLM + 6-gate pipeline | item_detected |
cache_hit | Cached result is being replayed | prescription |
no_rx_detected | Transcript contains no Rx vocabulary anchors — LLM skipped | (stream closes) |
2. item_detected — Medication Found
Emitted once per medication parsed from the doctor’s input. Each event
includes the extracted item data and the results of the per-item safety
gates (gates 1, 2, 6). The pending_gates array lists the cross-item
gates that will arrive later in gates_complete.
event: item_detecteddata: { "index": 0, "item": { "medication_name": "Dipirona", "dosage": "500mg", "route": "oral", "frequency": "6/6h", "duration": "5 dias", "quantity": 20, "unit": "comprimidos", "instructions": "se dor" }, "gates": { "gate1_input_validation": { "status": "passed", "message": "Dados do medicamento válidos." }, "gate2_cmed_resolution": { "status": "passed", "severity": "info", "message": "Medicamento identificado na base CMED: DIPIRONA SODICA 500MG COM CT BL AL PLAS PVDC X 20 (similaridade: 97%)", "details": { "match_type": "auto", "similarity": 0.97, "produto": "DIPIRONA SODICA 500MG COM CT BL AL PLAS PVDC X 20" } }, "gate5_controlled_substance": { "status": "passed", "message": "Medicamento não é substância controlada." } }, "pending_gates": [ "gate3_drug_interactions", "gate4_duplicate_therapy" ]}Item Fields
| Field | Type | Description |
|---|---|---|
medication_name | string | Name of the medication as spoken by the doctor |
dosage | string | null | Dosage (e.g., "500mg", "500mg/5ml") |
route | string | null | Administration route ("oral", "IV", "IM", "SC", "sublingual", "topical") |
frequency | string | null | Dosing frequency (e.g., "8/8h", "1x/dia", "12/12h") |
duration | string | null | Treatment duration (e.g., "7 dias", "uso contínuo") |
quantity | integer | null | Total quantity to dispense |
unit | string | null | Quantity unit (e.g., "comprimidos", "mL") |
instructions | string | null | Additional instructions (e.g., "se dor", "em jejum") |
3. gates_complete — Cross-Item Analysis Done
Emitted after all items have been detected and the cross-item safety gates finish their analysis across the full prescription.
Each gate result includes top-level gate_name, status, severity,
message, and details keys (produced by GateResult.to_dict()). The
details shape is gate-specific — documented per gate below.
event: gates_completedata: { "gate3_drug_interactions": { "gate_name": "drug_interactions", "status": "passed", "severity": "info", "message": "Nenhuma interação medicamentosa detectada.", "details": { "interactions": [] } }, "gate4_duplicate_therapy": { "gate_name": "duplicate_therapy", "status": "passed", "severity": "info", "message": "Nenhuma terapia duplicada encontrada.", "details": { "matched_items": [] } }}When interactions are found, Gate 3 populates details.interactions[]
with one enriched entry per interaction. Each entry exposes the
pharmacological metadata needed for client-side display and audit logging:
event: gates_completedata: { "gate3_drug_interactions": { "gate_name": "drug_interactions", "status": "passed", "severity": "major", "message": "⚠️ Interação medicamentosa maior detectada entre Varfarina e Aspirina: aumento do risco de sangramento. Monitorar INR e sinais de sangramento.", "details": { "interactions": [ { "drug_a": "Varfarina", "drug_b": "Aspirina", "severity": "major", "mechanism": "Inibição plaquetária aditiva à anticoagulação", "clinical_effect": "Aumento do risco de sangramento", "recommendation": "Monitorar INR e sinais de sangramento", "extraction_method": "matrix_tier1" } ] } }, "gate4_duplicate_therapy": { "gate_name": "duplicate_therapy", "status": "passed", "severity": "info", "message": "Nenhuma terapia duplicada encontrada.", "details": { "matched_items": [] } }}Per-interaction fields inside details.interactions[]:
| Field | Type | Description |
|---|---|---|
drug_a | string | First interacting substance (active ingredient). |
drug_b | string | Second interacting substance (active ingredient). |
severity | string | One of critical, major, moderate, minor (see Severity Levels). |
mechanism | string | null | Pharmacological mechanism of the interaction. |
clinical_effect | string | null | Expected clinical outcome. |
recommendation | string | null | Prescriber guidance (e.g., monitoring, dose adjustment). |
extraction_method | string | null | How the interaction was identified (e.g., "matrix_tier1", "llm_fallback"). |
When Gate 4 detects a duplicate (Level 1 or Level 2), it populates the
details shape with the detection level and matched items:
// Level 1 — exact medication name or active-ingredient match"gate4_duplicate_therapy": { "gate_name": "duplicate_therapy", "status": "passed", "severity": "high", "message": "⚠️ DUPLICIDADE DE MEDICAMENTO: 'Dipirona 500mg' já está ativo na prescrição deste paciente...", "details": { "level": 1, "matched_items": [ { "medication_name": "Dipirona 500mg", "active_ingredient": "Dipirona sódica" } ] }}
// Level 2 — same EPhMRA therapeutic class"gate4_duplicate_therapy": { "gate_name": "duplicate_therapy", "status": "passed", "severity": "medium", "message": "📋 MESMA CLASSE TERAPÊUTICA: 'Ibuprofeno' pertence à mesma classe terapêutica (ANTI-INFLAMATÓRIOS NÃO-ESTERÓIDES) que 'Naproxeno', já prescrito...", "details": { "level": 2, "ephmra_code": "M1A", "class_description": "ANTI-INFLAMATÓRIOS NÃO-ESTERÓIDES", "matched_items": [ { "medication_name": "Naproxeno", "active_ingredient": "Naproxeno sódico", "classe_terapeutica": "M1A - ANTI-INFLAMATÓRIOS NÃO-ESTERÓIDES" } ] }}Duplicate therapy reports two detection levels:
| Level | Detector | Example |
|---|---|---|
| Level 1 | Identical active ingredient | ”Dipirona 500mg” + “Dipirona 1g” |
| Level 2 | Same EPhMRA pharmacological class | Two different NSAIDs prescribed together |
4. prescription — Final Result
The complete, aggregated prescription with all items and gate results. This is the terminal success event — the stream closes after it.
event: prescriptiondata: { "items": [ { "medication_name": "Dipirona", "dosage": "500mg", "route": "oral", "frequency": "6/6h", "duration": "5 dias", "quantity": 20, "unit": "comprimidos", "instructions": "se dor" } ], "gates_per_item": [ { "gate1_input_validation": { "status": "passed" }, "gate2_cmed_resolution": { "status": "passed", "details": { "match_tier": "auto" } }, "gate5_controlled_substance": { "status": "passed" } } ], "gates_cross_item": { "gate3_drug_interactions": { "status": "passed" }, "gate4_duplicate_therapy": { "status": "passed" } }, "requires_confirmation": true, "is_degraded": false}| Field | Type | Description |
|---|---|---|
items | array | All extracted medication items |
gates_per_item | array | Per-item gate results (gates 1, 2, 6) indexed by item position |
gates_cross_item | object | Cross-item gate results (gates 3, 4) |
requires_confirmation | boolean | Always true — physician MUST confirm via POST /v1/prescriptions before finalization |
is_degraded | boolean | true if a safety gate or upstream service encountered an error and partial results are returned |
5. error — Processing Failed
Emitted when an unrecoverable error occurs. The degraded flag indicates
whether partial results may still be usable.
event: errordata: { "code": "LLM_TIMEOUT", "message": "Tempo limite excedido na comunicação com o modelo de linguagem", "degraded": true}| Error Code | Description | Retry? |
|---|---|---|
LLM_TIMEOUT | The LLM did not respond within the time limit | Yes, with backoff |
LLM_ERROR | The LLM returned a non-2xx HTTP status (transient upstream failure) | Yes, after short delay |
PARSE_ERROR | Could not extract medications from input — retry once, then rephrase | Yes, once |
EXTRACTION_VALIDATION_ERROR | The LLM produced structured items that failed Pydantic validation against PrescriptionItemExtracted (typically a required field returned as null or with the wrong type). Per-field detail is available in details[] (see below). | No — surfaces a structured-but-incomplete extraction; rephrase or request re-extraction with cleaner input |
INTERNAL_ERROR | Unexpected server error | Yes, with backoff |
The EXTRACTION_VALIDATION_ERROR event additionally carries a details
array describing each Pydantic validation failure:
event: errordata: { "code": "EXTRACTION_VALIDATION_ERROR", "message": "Item extraído não passou na validação estrutural", "degraded": true, "details": [ { "field": "items.0.medication_name", "reason": "Field required", "type": "missing" }, { "field": "items.0.dosage", "reason": "Input should be a valid string", "type": "string_type" } ]}| Field (per entry) | Type | Description |
|---|---|---|
field | string | Dotted path to the offending field inside the LLM-extracted payload. |
reason | string | Human-readable Pydantic error message (locale-controlled — see warning above). |
type | string | Pydantic error-type token (e.g., "missing", "string_type", "int_parsing") — stable across locales; use this for programmatic branching. |
The 6-Gate Safety Pipeline
Every prescription passes through the same advisory safety gates. DELPHOS follows the physician autonomy principle — all gates are advisory. They inform and warn, but never block the physician’s clinical decision.
| Gate | Name (canonical) | Scope | What It Checks | Blocks? |
|---|---|---|---|---|
| 1 | Validação de Entrada | Per-item | Required fields are present (medication_name mandatory) | Only blocking gate |
| 2 | Resolução de Medicamento | Per-item | Matches medication against the ANVISA/CMED national database | Advisory |
| 3 | Interações Medicamentosas | Cross-item | Drug-drug interactions across all prescription items | Advisory |
| 4 | Duplicidade Terapêutica | Cross-item | Overlapping active ingredients (Level 1) and EPhMRA classes (Level 2) | Advisory |
| 5 | Substâncias Controladas | Per-item | Flags ANVISA controlled substances (Portaria 344/98, RDC 20/2011) | Advisory |
| 6 | Cruzamento de Alergias | Per-item | (v1.1) Cross-references prescription items against the patient’s allergy list | Advisory (when shipped) |
Gate 1 — Input Validation
The only blocking gate. Verifies that medication_name is present.
If it fails, the item is rejected — no point running expensive downstream
gates on a malformed item.
Gate 2 — CMED Resolution (ANVISA / CMED)
Matches the medication against the ANVISA CMED price/registry database. The match tier informs your UI how confident the resolution is:
| Tier | Meaning | UI treatment |
|---|---|---|
auto | High-confidence match | Display CMED product name; no action required |
suggestion | Moderate confidence | Show DELPHOS’s suggested CMED product; ask doctor to confirm |
none | No match | Render as “Off-CMED prescription” — still valid, just unmatched |
CMED resolution is advisory because off-CMED prescriptions are entirely legal — compounded medications, imported drugs, and brand-new approvals are routinely off-registry.
Gate 3 — Drug-Drug Interactions (cross-item)
Checks pairwise interactions across all items in the prescription.
Returns pairs_checked, interactions_found, and a detailed
interactions[] array with severity (info / warning / moderate /
major / critical) per pair.
Gate 4 — Duplicate Therapy (cross-item)
Two-level detection:
- Level 1: identical active ingredient (e.g., two formulations of Dipirona).
- Level 2: same EPhMRA pharmacological class (e.g., two NSAIDs).
Gate 5 — Substâncias Controladas
Flags substances controlled under
ANVISA Portaria 344/98
and RDC 20/2011.
Returns the controlled-substance category (e.g., A1, A2, B1, C1)
when applicable, so your UI can warn the doctor about the special
prescription format required (yellow form, blue form, retention copy).
Severity Levels
When any gate produces a warning, it includes a severity level. Two distinct severity vocabularies are in use across the gates — be sure to handle both when rendering UI:
Gate 3 (Drug Interactions) — SeverityLevel enum:
| Severity | Meaning | Recommended UI treatment |
|---|---|---|
info | Informational — no concerns | Subtle indicator |
warning | Advisory alert — physician should review | Yellow highlight |
moderate | Moderate concern — review recommended | Orange highlight |
major | Significant concern — careful review needed | Red highlight with details |
critical | Serious safety concern — demands attention | Prominent red alert |
Gate 4 (Duplicate Therapy) — custom strings (not the SeverityLevel enum):
| Severity | Meaning | Emitted by | Recommended UI treatment |
|---|---|---|---|
high | Level 1 — exact medication name or active-ingredient duplicate | Gate 4 | Red highlight; “confirm duplicate?” prompt |
medium | Level 2 — same EPhMRA therapeutic class (e.g., two NSAIDs) | Gate 4 | Orange highlight; “intentional?” prompt |
Other gates (CMED resolution, Controlled Substance) reuse values from the
SeverityLevel enum above — typically info for clean cases and warning
when results are advisory but degraded (e.g., CMED service unavailable).
Cache Semantics — Audio-Progressive Path
When accumulated_text is present, the endpoint routes through a
cache-aware wrapper that mirrors the Progressive
SOAP behaviour.
Word-Delta Threshold
| Aspect | Value |
|---|---|
| Threshold | 30 new words since the last cached result |
| Effect | Below threshold → cache hit (replay cached result). At or above → LLM + gate pipeline re-fires |
| Rationale | Mirrors Progressive SOAP (ratified 2026-05-19, founder direct). Empirically reduces LLM calls ~6× without losing recency |
Redis Cache
| Aspect | Value |
|---|---|
| Cache key scope | (tenant_app_id, consultation_id) — strict multi-tenant isolation |
| TTL | 24 hours |
| Lock scope | (tenant_app_id, consultation_id) — prevents thundering-herd LLM calls on concurrent polls |
previous_rx_hash Identity
Each cached prescription payload is associated with a server-side hash.
Pass it back as previous_rx_hash on subsequent polls:
- If the supplied hash matches the cached hash → permits a cache hit even when the word-delta has been crossed.
- If the supplied hash does not match → forces a refresh (regardless of word-delta), so a client that has lost state can resync.
no_rx_detected Short-Circuit
The wrapper scans the accumulated transcript for PT-BR Rx vocabulary
anchors (medication names, dosage/route/frequency markers) before
invoking the LLM. When no anchors are found, the wrapper short-circuits
with a single status: no_rx_detected event and skips the LLM entirely.
This is the dominant case in the first few minutes of a consultation
and keeps idle polls below 100 ms.
Two-Phase Commit — Stream → Review → Finalize
The streaming endpoint produces a diagnostic preview with
requires_confirmation: true. To finalize, the doctor reviews the
streamed result in your UI, tweaks dosages or removes items, then calls:
POST /v1/prescriptionswith the confirmed items. That call persists the prescription as a
draft, and the doctor advances through the lifecycle to signed
(see Digital Signing).
Stream (extract + validate) → Doctor reviews + tweaks → POST /v1/prescriptions (persist + sign) [diagnostic] [in your UI] [authoritative write]Client Integration
Audio-Progressive Pattern — Parallel with Progressive SOAP
In a typical Mnesis consultation, you poll two streaming endpoints in
parallel as the audio transcript grows: progressive-soap/stream for the
clinical note and prescriptions/stream for the prescription. Both share
the same consultation_id and the same accumulated_text cadence (one
poll per audio chunk, ~5 seconds of audio).
/** * Audio-progressive prescription stream — Mnesis pattern. * Polls /v1/prescriptions/stream alongside /v1/consultation/progressive-soap/stream * as the transcript accumulates during a live consultation. */interface StreamingPrescriptionRequest { consultation_id: string; patient_id: string; doctor_id: string; doctor_input: string; // placeholder on audio path accumulated_text: string; // the growing transcript previous_rx_hash: string | null; // client-cached identity, null on first poll stream: true;}
interface StreamingPrescriptionResponse { items: Array<{ medication_name: string; dosage: string | null; route: string | null; frequency: string | null; duration: string | null; quantity: number | null; unit: string | null; instructions: string | null; }>; gates_per_item: Array<Record<string, unknown>>; gates_cross_item: Record<string, unknown>; requires_confirmation: true; is_degraded: boolean;}
async function streamPrescription( apiKey: string, body: StreamingPrescriptionRequest, onEvent: (type: string, data: any) => void,): Promise<void> { const response = await fetch('/v1/prescriptions/stream', { method: 'POST', headers: { 'Content-Type': 'application/json', 'x-api-key': apiKey, 'Accept': 'text/event-stream', }, body: JSON.stringify(body), });
if (!response.ok) { throw new Error(`HTTP ${response.status}: ${response.statusText}`); }
const reader = response.body!.getReader(); const decoder = new TextDecoder(); let buffer = '';
while (true) { const { done, value } = await reader.read(); if (done) break;
buffer += decoder.decode(value, { stream: true }); const blocks = buffer.split('\n\n'); buffer = blocks.pop()!; // keep incomplete block
for (const block of blocks) { if (!block.trim()) continue; const evtMatch = block.match(/^event:\s*(.+)$/m); const dataMatch = block.match(/^data:\s*(.+)$/m); if (evtMatch && dataMatch) { onEvent(evtMatch[1].trim(), JSON.parse(dataMatch[1])); } } }}
// ── Usage — polled per audio chunk ──────────────────────
let previousRxHash: string | null = null;let accumulatedText = '';
// Called whenever a new transcribed chunk landsasync function onTranscriptChunk(newSegmentText: string) { accumulatedText += ' ' + newSegmentText;
await streamPrescription( 'YOUR_TENANT_API_KEY', { consultation_id: 'ATD-2026-001234', patient_id: 'pat_AaBbCcDdEeFfGgHhIiJj11', doctor_id: 'doc_KkLlMmNnOoPpQqRrSsTt22', doctor_input: 'placeholder', // audio path accumulated_text: accumulatedText, // the FULL growing transcript previous_rx_hash: previousRxHash, stream: true, }, (type, data) => { switch (type) { case 'status': if (data.type === 'no_rx_detected') { showRxPanel('No prescription dictated yet'); } else if (data.type === 'cache_hit') { // about to receive a cached prescription event } break; case 'item_detected': renderItem(data.index, data.item, data.gates); break; case 'gates_complete': renderCrossItemGates(data); break; case 'prescription': renderFinalPrescription(data); previousRxHash = computeClientHash(data); // your hash strategy break; case 'error': showError(data.code, data.message, data.degraded); break; } }, );}import httpximport jsonfrom typing import AsyncGenerator
async def stream_prescription( base_url: str, api_key: str, consultation_id: str, patient_id: str, doctor_id: str, doctor_input: str = "placeholder", accumulated_text: str | None = None, previous_rx_hash: str | None = None,) -> AsyncGenerator[tuple[str, dict], None]: """Stream prescription extraction via SSE.
Yields (event_type, data) tuples. When ``accumulated_text`` is provided, dispatches through the audio-progressive cache-aware wrapper. Otherwise, uses the legacy text-input path. """ url = f"{base_url}/v1/prescriptions/stream" payload: dict = { "consultation_id": consultation_id, "patient_id": patient_id, "doctor_id": doctor_id, "doctor_input": doctor_input, "stream": True, } if accumulated_text is not None: payload["accumulated_text"] = accumulated_text if previous_rx_hash is not None: payload["previous_rx_hash"] = previous_rx_hash
headers = { "Content-Type": "application/json", "x-api-key": api_key, "Accept": "text/event-stream", }
async with httpx.AsyncClient() as client: async with client.stream( "POST", url, json=payload, headers=headers, timeout=60.0, ) as response: response.raise_for_status() buffer = ""
async for chunk in response.aiter_text(): buffer += chunk
while "\n\n" in buffer: block, buffer = buffer.split("\n\n", 1) if not block.strip(): continue
event_type = None data_str = None for line in block.split("\n"): if line.startswith("event: "): event_type = line[7:].strip() elif line.startswith("data: "): data_str = line[6:].strip()
if event_type and data_str: yield event_type, json.loads(data_str)
# ── Usage — audio-progressive ──────────────────────────
import asyncio
async def main(): previous_hash: str | None = None accumulated = "Paciente relata cefaleia tensional. Prescrever Dipirona 500mg via oral 6/6h se dor por 5 dias."
async for event_type, data in stream_prescription( base_url="https://your-instance.delphos.app", api_key="YOUR_TENANT_API_KEY", consultation_id="ATD-2026-001234", patient_id="pat_AaBbCcDdEeFfGgHhIiJj11", doctor_id="doc_KkLlMmNnOoPpQqRrSsTt22", accumulated_text=accumulated, previous_rx_hash=previous_hash, ): if event_type == "status": print(f" → status: {data.get('type')}") elif event_type == "item_detected": item = data["item"] print(f" → item {data['index']}: {item['medication_name']} {item.get('dosage', '')}") elif event_type == "prescription": print(f" → final: {len(data['items'])} items, degraded={data['is_degraded']}") elif event_type == "error": print(f" ! error: {data['code']} — {data['message']}")
asyncio.run(main())curl -X POST 'https://your-instance.delphos.app/v1/prescriptions/stream' \ -H 'Content-Type: application/json' \ -H 'x-api-key: YOUR_TENANT_API_KEY' \ -H 'Accept: text/event-stream' \ --no-buffer \ -d '{ "consultation_id": "ATD-2026-001234", "patient_id": "pat_AaBbCcDdEeFfGgHhIiJj11", "doctor_id": "doc_KkLlMmNnOoPpQqRrSsTt22", "doctor_input": "placeholder", "accumulated_text": "Paciente relata cefaleia tensional ha 3 dias. Vou prescrever Dipirona 500mg via oral 6/6h se dor por 5 dias.", "previous_rx_hash": null, "stream": true }'curl -X POST 'https://your-instance.delphos.app/v1/prescriptions/stream' \ -H 'Content-Type: application/json' \ -H 'x-api-key: YOUR_TENANT_API_KEY' \ -H 'Accept: text/event-stream' \ --no-buffer \ -d '{ "consultation_id": "ATD-2026-001234", "patient_id": "pat_AaBbCcDdEeFfGgHhIiJj11", "doctor_id": "doc_KkLlMmNnOoPpQqRrSsTt22", "doctor_input": "Dipirona 500mg via oral 6/6h se dor por 5 dias.", "stream": true }'Non-Streaming Fallback
Set "stream": false to receive a single JSON response with all items and
gate results at once. The response structure is identical to the
prescription SSE event payload.
curl -X POST 'https://your-instance.delphos.app/v1/prescriptions/stream' \ -H 'Content-Type: application/json' \ -H 'x-api-key: YOUR_TENANT_API_KEY' \ -d '{ "consultation_id": "ATD-2026-001234", "patient_id": "pat_AaBbCcDdEeFfGgHhIiJj11", "doctor_id": "doc_KkLlMmNnOoPpQqRrSsTt22", "doctor_input": "Dipirona 500mg via oral 6/6h se dor por 5 dias.", "stream": false }'Error Handling Mid-Stream
When an error occurs during streaming, DELPHOS emits an error event and
closes the connection. Your client should implement retry logic with
exponential backoff, falling back to the non-streaming endpoint after max
retries.
async function streamWithRetry( apiKey: string, body: StreamingPrescriptionRequest, onEvent: (type: string, data: any) => void, maxRetries = 3,): Promise<void> { let attempt = 0; // Reason: `EXTRACTION_VALIDATION_ERROR` is intentionally NOT retriable — // structured-but-incomplete extractions surface a content problem, not a // transient infrastructure failure. Branch on the `code` field, never on // the `message` (which is locale-controlled). const retriable = new Set(['PARSE_ERROR', 'LLM_TIMEOUT', 'LLM_ERROR']);
while (attempt < maxRetries) { try { await streamPrescription(apiKey, body, (type, data) => { if (type === 'error' && !retriable.has(data.code)) { throw new Error(data.code); } onEvent(type, data); }); return; // success } catch (err) { attempt++; if (attempt >= maxRetries) { // Fall back to non-streaming endpoint const response = await fetch('/v1/prescriptions/stream', { method: 'POST', headers: { 'Content-Type': 'application/json', 'x-api-key': apiKey, }, body: JSON.stringify({ ...body, stream: false }), }); const data = await response.json(); onEvent('prescription', data); return; } // Exponential backoff: 1s, 2s, 4s await new Promise(r => setTimeout(r, 1000 * 2 ** (attempt - 1))); } }}FAQ
Should I poll the streaming endpoint on every audio chunk? Yes. The word-delta threshold + Redis cache absorb the overhead. Below 30 new words, polls return from cache in P95 < 50 ms. The wrapper exists specifically to make per-chunk polling cheap.
What’s doctor_input for on the audio-progressive path?
It’s a required field by schema — pass any non-empty placeholder (e.g.,
"placeholder"). The wrapper prefers accumulated_text when present.
This is a legacy artefact of the original text-input contract and is
preserved for backwards-compat.
How do I compute previous_rx_hash?
Use any deterministic hash of the previous prescription payload (e.g.,
SHA-256 of JSON.stringify({items, gates_cross_item})). Pass it back on
the next poll. When it matches the server’s cached hash, you get a cache
hit even past the word-delta threshold.
Why is gate 6 missing from v1.0 responses? Gate 6 (Cruzamento de Alergias) is the v1.1 allergy-cross-reference feature and is not implemented in the current v1.0 release. See DELPHOS #964 for the implementation ticket. The canonical 6-gate numbering was ratified by DELPHOS #959 on 2026-05-19, and aligns with the public spec on delphosai.io.
Why are gates advisory and not blocking?
Per Lei 12.842/2013 (Brazilian Physician Act) and the CFM Code of
Ethics (Articles 20-21), the physician has exclusive authority over
clinical decisions. DELPHOS provides decision support, not clinical
gating. Even critical interactions are surfaced as warnings, never
hard blocks.
What happens if the LLM is slow or fails?
The endpoint emits an error event with degraded: true and any
partial results gathered so far. Your client should retry with
exponential backoff, then fall back to the non-streaming variant.
Can I use this without recorded audio?
Yes — the legacy text-input path (doctor_input only, no
accumulated_text) is the original K.I.T.T. chat integration and
remains fully supported. It just doesn’t cache (single-shot calls
don’t benefit from caching).
Is the streamed prescription persisted automatically?
No. The streaming endpoint is diagnostic — requires_confirmation: true
indicates the doctor must explicitly call POST /v1/prescriptions to
persist. This two-phase commit is intentional.
Related Articles
- SOAP Streaming — Real-time clinical note generation via SSE (run in parallel with this endpoint)
- Real-time Transcription — Audio chunk streaming with speaker diarization
- Drug Safety — Post-creation interaction analysis and acknowledgment workflow
- Digital Signing — Finalizing a prescription after the streamed preview
- Creating Prescriptions —
POST /v1/prescriptionsfinalization endpoint - Medication Search — CMED database queries (referenced by gate 2)
- API Explorer — Browse all DELPHOS endpoints