Skip to content

Real-time Transcription

Real-time Transcription

DELPHOS provides real-time audio transcription during consultations. Audio is sent in chunks, each transcribed immediately and appended to the session transcript. Speaker diarization labels segments as DOCTOR or PATIENT.

How It Works

Client DELPHOS
│ │
│── POST /chunk (audio 1) ──→│ Transcribe → Append
│←── transcription text ────│
│ │
│── POST /chunk (audio 2) ──→│ Transcribe → Append
│←── transcription text ────│
│ │
│── POST /chunk (final) ────→│ Transcribe → Generate SOAP
│←── full result ───────────│

Each chunk is processed independently. The client can display progressive transcription results to the physician as they arrive.


Sending Audio Chunks

Terminal window
curl -X POST "https://your-instance.delphos.app/v1/consultation/chunk" \
-H "x-api-key: $DELPHOS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"chunk_sequence": 1,
"audio_base64": "UklGRiQAAABXQVZFZm10IBAAAAABAAEAQB8AAIA...",
"audio_format": "wav",
"is_final": false
}'

Request Parameters

FieldTypeRequiredDescription
session_iduuidYesActive session ID from /consultation/start
chunk_sequenceintegerNoSequence number (auto-incremented if omitted)
audio_base64stringYesBase64-encoded audio data
audio_formatstringNoAudio format (default: wav)
is_finalbooleanNoSet true on the last chunk to trigger SOAP generation

Supported Audio Formats

FormatExtensionNotes
WAV.wavRecommended — lossless, best transcription quality
MP3.mp3Compressed, good quality
WebM.webmCommon in web browsers
OGG.oggOpen format
FLAC.flacLossless compression
M4A.m4aApple format
AAC.aacAdvanced Audio Coding

Response

{
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"chunk_id": "660e8400-e29b-41d4-a716-446655440001",
"chunk_sequence": 1,
"transcription": {
"text": "Médico: Bom dia, como está se sentindo hoje?",
"segments": [
{
"start": 0.0,
"end": 3.5,
"text": "Bom dia, como está se sentindo hoje?",
"speaker": "DOCTOR"
}
],
"duration_seconds": 3.5
},
"session_state": "ACTIVE",
"total_chunks": 1,
"total_transcript_length": 47,
"processing_time_ms": 145.3,
"message": null,
"soap_generation": null
}

Speaker Diarization

When diarize is enabled at session start (the default), DELPHOS identifies who is speaking in each segment:

SpeakerDescription
DOCTORPhysician voice
PATIENTPatient voice

The diarization is performed per-chunk. The segments array in each response includes timestamps and speaker labels, enabling your UI to display a conversation-style transcript.


Progressive Transcript

At any time during an active session, retrieve the full accumulated transcript:

Terminal window
curl -X GET "https://your-instance.delphos.app/v1/consultation/{session_id}/transcript" \
-H "x-api-key: $DELPHOS_API_KEY"
{
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"transcript": "Médico: Bom dia, como está se sentindo hoje?\nPaciente: As dores de cabeça estão menos frequentes..."
}

This endpoint is useful for displaying the full conversation in your UI while chunks continue to arrive.


Sending the Final Chunk

Set is_final: true on the last audio chunk to signal the end of recording. This triggers SOAP note generation:

{
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"audio_base64": "UklGR...",
"audio_format": "wav",
"is_final": true
}

The response will include a soap_generation field with the status:

{
"transcription": { "..." },
"soap_generation": {
"status": "processing",
"message": "SOAP note generation started"
}
}

Poll the session status endpoint to check when the SOAP note is ready.


Integration Pattern

A typical integration streams chunks from the client’s microphone:

import base64
import httpx
import time
CHUNK_DURATION_SECONDS = 5
def stream_consultation(session_id: str, audio_chunks: list[bytes]):
"""Stream audio chunks and display progressive transcription."""
client = httpx.Client(
base_url="https://your-instance.delphos.app/v1",
headers={"x-api-key": DELPHOS_API_KEY},
timeout=30.0,
)
for i, chunk in enumerate(audio_chunks):
is_last = i == len(audio_chunks) - 1
response = client.post(
"/consultation/chunk",
json={
"session_id": session_id,
"chunk_sequence": i + 1,
"audio_base64": base64.b64encode(chunk).decode(),
"audio_format": "wav",
"is_final": is_last,
},
)
response.raise_for_status()
result = response.json()
# Display progressive transcription
text = result["transcription"]["text"]
print(f"[Chunk {i+1}] {text}")
# Poll for SOAP completion
while True:
status = client.get(f"/consultation/{session_id}/status").json()
if status["state"] == "COMPLETED":
return status["soap_note"]
if status["state"] == "ERROR":
raise RuntimeError(status["error"])
time.sleep(1)

Error Handling

StatusCause
404 Not FoundSession does not exist
409 ConflictSession is not in ACTIVE state (already ended or errored)
503 Service UnavailableTranscription service temporarily unavailable (retry with backoff)

Next Steps