Real-time Transcription
Real-time Transcription
DELPHOS provides real-time audio transcription during consultations. Audio
is sent in chunks, each transcribed immediately and appended to the session
transcript. Speaker diarization labels segments as DOCTOR or PATIENT.
How It Works
Client DELPHOS │ │ │── POST /chunk (audio 1) ──→│ Transcribe → Append │←── transcription text ────│ │ │ │── POST /chunk (audio 2) ──→│ Transcribe → Append │←── transcription text ────│ │ │ │── POST /chunk (final) ────→│ Transcribe → Generate SOAP │←── full result ───────────│Each chunk is processed independently. The client can display progressive transcription results to the physician as they arrive.
Sending Audio Chunks
curl -X POST "https://your-instance.delphos.app/v1/consultation/chunk" \ -H "x-api-key: $DELPHOS_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "session_id": "550e8400-e29b-41d4-a716-446655440000", "chunk_sequence": 1, "audio_base64": "UklGRiQAAABXQVZFZm10IBAAAAABAAEAQB8AAIA...", "audio_format": "wav", "is_final": false }'import base64import httpx
# Read audio from microphone or filewith open("chunk_001.wav", "rb") as f: audio_bytes = f.read()
response = httpx.post( "https://your-instance.delphos.app/v1/consultation/chunk", headers={"x-api-key": DELPHOS_API_KEY}, json={ "session_id": session_id, "chunk_sequence": 1, "audio_base64": base64.b64encode(audio_bytes).decode(), "audio_format": "wav", "is_final": False, },)
result = response.json()print(f"Transcription: {result['transcription']['text']}")print(f"Duration: {result['transcription']['duration_seconds']}s")Request Parameters
| Field | Type | Required | Description |
|---|---|---|---|
session_id | uuid | Yes | Active session ID from /consultation/start |
chunk_sequence | integer | No | Sequence number (auto-incremented if omitted) |
audio_base64 | string | Yes | Base64-encoded audio data |
audio_format | string | No | Audio format (default: wav) |
is_final | boolean | No | Set true on the last chunk to trigger SOAP generation |
Supported Audio Formats
| Format | Extension | Notes |
|---|---|---|
| WAV | .wav | Recommended — lossless, best transcription quality |
| MP3 | .mp3 | Compressed, good quality |
| WebM | .webm | Common in web browsers |
| OGG | .ogg | Open format |
| FLAC | .flac | Lossless compression |
| M4A | .m4a | Apple format |
| AAC | .aac | Advanced Audio Coding |
Response
{ "session_id": "550e8400-e29b-41d4-a716-446655440000", "chunk_id": "660e8400-e29b-41d4-a716-446655440001", "chunk_sequence": 1, "transcription": { "text": "Médico: Bom dia, como está se sentindo hoje?", "segments": [ { "start": 0.0, "end": 3.5, "text": "Bom dia, como está se sentindo hoje?", "speaker": "DOCTOR" } ], "duration_seconds": 3.5 }, "session_state": "ACTIVE", "total_chunks": 1, "total_transcript_length": 47, "processing_time_ms": 145.3, "message": null, "soap_generation": null}Speaker Diarization
When diarize is enabled at session start (the default), DELPHOS identifies
who is speaking in each segment:
| Speaker | Description |
|---|---|
DOCTOR | Physician voice |
PATIENT | Patient voice |
The diarization is performed per-chunk. The segments array in each response
includes timestamps and speaker labels, enabling your UI to display a
conversation-style transcript.
Progressive Transcript
At any time during an active session, retrieve the full accumulated transcript:
curl -X GET "https://your-instance.delphos.app/v1/consultation/{session_id}/transcript" \ -H "x-api-key: $DELPHOS_API_KEY"{ "session_id": "550e8400-e29b-41d4-a716-446655440000", "transcript": "Médico: Bom dia, como está se sentindo hoje?\nPaciente: As dores de cabeça estão menos frequentes..."}This endpoint is useful for displaying the full conversation in your UI while chunks continue to arrive.
Sending the Final Chunk
Set is_final: true on the last audio chunk to signal the end of
recording. This triggers SOAP note generation:
{ "session_id": "550e8400-e29b-41d4-a716-446655440000", "audio_base64": "UklGR...", "audio_format": "wav", "is_final": true}The response will include a soap_generation field with the status:
{ "transcription": { "..." }, "soap_generation": { "status": "processing", "message": "SOAP note generation started" }}Poll the session status endpoint to check when the SOAP note is ready.
Integration Pattern
A typical integration streams chunks from the client’s microphone:
import base64import httpximport time
CHUNK_DURATION_SECONDS = 5
def stream_consultation(session_id: str, audio_chunks: list[bytes]): """Stream audio chunks and display progressive transcription.""" client = httpx.Client( base_url="https://your-instance.delphos.app/v1", headers={"x-api-key": DELPHOS_API_KEY}, timeout=30.0, )
for i, chunk in enumerate(audio_chunks): is_last = i == len(audio_chunks) - 1
response = client.post( "/consultation/chunk", json={ "session_id": session_id, "chunk_sequence": i + 1, "audio_base64": base64.b64encode(chunk).decode(), "audio_format": "wav", "is_final": is_last, }, ) response.raise_for_status() result = response.json()
# Display progressive transcription text = result["transcription"]["text"] print(f"[Chunk {i+1}] {text}")
# Poll for SOAP completion while True: status = client.get(f"/consultation/{session_id}/status").json() if status["state"] == "COMPLETED": return status["soap_note"] if status["state"] == "ERROR": raise RuntimeError(status["error"]) time.sleep(1)Error Handling
| Status | Cause |
|---|---|
404 Not Found | Session does not exist |
409 Conflict | Session is not in ACTIVE state (already ended or errored) |
503 Service Unavailable | Transcription service temporarily unavailable (retry with backoff) |
Next Steps
- Consultation Lifecycle — Full session management
- Working with Records — Post-consultation editing
- Clinical Summaries — Patient intelligence