Voice-Enabled Scheduling
Patients can book, reschedule, and cancel appointments through natural voice interactions on WhatsApp audio messages and phone calls. DELPHOS transcribes the audio, extracts scheduling intent using natural language understanding, and processes the request through the scheduling pipeline — all without the patient needing to navigate menus or type messages.
How it works
The voice scheduling pipeline converts a patient’s spoken request into a structured scheduling action in five stages:
- Audio validation — verify file format, duration, and size constraints
- Format conversion — normalize audio to the format expected by the Transcription Engine (performed in-memory, no disk writes)
- Transcription — convert speech to text via the Transcription Engine
- Intent extraction — the AI Engine parses the transcribed text to identify scheduling intent and extract entities
- Scheduling action — execute the resolved action (book, reschedule, cancel, or query availability)
Patient audio ──> Validation ──> Conversion ──> Transcription ──> NLU ──> Action (WhatsApp | | | | | or phone) format/size in-memory speech-to- intent book / checks only (LGPD) text + entities reschedule / cancelVoice scheduling API
Process an audio recording
POST /v1/scheduling/voiceAccepts multipart/form-data containing an audio file and patient context.
Returns structured scheduling data extracted from the spoken request.
| Field | Type | Required | Description |
|---|---|---|---|
audio | file | Yes | Audio recording (WAV, OGG, MP3, WebM) |
patient_id | UUID | Yes | Patient submitting the request |
session_id | UUID | No | Conversation session for multi-turn context |
channel | string | No | Source channel: whatsapp, phone, or web |
curl -X POST "https://your-instance.delphos.app/v1/scheduling/voice" \ -H "x-api-key: YOUR_API_KEY" \ -F "audio=@recording.ogg" \ -F "patient_id=d4e5f6a7-b8c9-0123-defa-234567890123" \ -F "channel=whatsapp"import httpx
with open("recording.ogg", "rb") as f: response = httpx.post( "https://your-instance.delphos.app/v1/scheduling/voice", headers={"x-api-key": "YOUR_API_KEY"}, files={"audio": ("recording.ogg", f, "audio/ogg")}, data={ "patient_id": "d4e5f6a7-b8c9-0123-defa-234567890123", "channel": "whatsapp", }, )result = response.json()Response — returns a VoiceSchedulingResponse with the transcribed text,
extracted entities, and scheduling suggestions. When the audio is ambiguous,
returns a VoiceClarificationResponse containing a follow-up question.
Natural language understanding
Behind both voice and text scheduling lies the natural language understanding layer. It parses Brazilian Portuguese scheduling requests and resolves entities against the clinic database.
Process a text message
POST /v1/scheduling/natural-language| Field | Type | Required | Description |
|---|---|---|---|
message | string | Yes | Free-text scheduling request in Portuguese |
patient_id | UUID | Yes | Patient context for entity resolution |
curl -X POST "https://your-instance.delphos.app/v1/scheduling/natural-language" \ -H "x-api-key: YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "message": "Quero remarcar minha consulta com a Dra. Ana para quinta de tarde", "patient_id": "d4e5f6a7-b8c9-0123-defa-234567890123" }'import httpx
response = httpx.post( "https://your-instance.delphos.app/v1/scheduling/natural-language", headers={"x-api-key": "YOUR_API_KEY"}, json={ "message": "Quero remarcar minha consulta com a Dra. Ana para quinta de tarde", "patient_id": "d4e5f6a7-b8c9-0123-defa-234567890123", },)result = response.json()Supported intents
| Intent | Description | Example utterance |
|---|---|---|
schedule | Book a new appointment | ”Quero marcar uma consulta com o Dr. Silva” |
reschedule | Move an existing appointment | ”Preciso remarcar minha consulta para semana que vem” |
cancel | Cancel an existing appointment | ”Cancela minha consulta de amanha” |
query | Check available slots | ”Tem horario com a Dra. Ana na sexta?” |
Extracted entities
The AI Engine extracts the following entities from the patient’s message:
| Entity | Examples |
|---|---|
| Provider name | ”Dr. Silva”, “Dra. Ana Beatriz” |
| Specialty | ”cardiologista”, “dermatologia” |
| Date expression | ”amanha”, “proxima terca”, “dia 15” |
| Time expression | ”as 10 horas”, “14:30” |
| Time period | ”de manha”, “a tarde”, “a noite” |
| Appointment type | ”consulta”, “retorno”, “exame” |
| Reason | ”dor de cabeca”, “acompanhamento” |
Brazilian Portuguese date parsing
The AI Engine understands a wide range of temporal expressions in Brazilian Portuguese, including:
- Relative days — “hoje”, “amanha”, “depois de amanha”, “ontem”
- Named weekdays — “segunda”, “proxima terca”, “quinta que vem”
- Relative weeks — “semana que vem”, “daqui a duas semanas”
- Specific dates — “dia 15”, “10 de abril”, “15/04”
- Time periods — “de manha” (08:00-12:00), “a tarde” (12:00-18:00), “a noite” (18:00-21:00)
Entity resolution
Extracted names and specialties are matched against database records. The resolution respects Row-Level Security (RLS) so a patient can only reference providers visible to their tenant. The response includes both the raw extracted text and the resolved database UUID when a match is found.
Response — an NLResponse containing:
| Field | Description |
|---|---|
intent | Detected scheduling intent |
confidence | Confidence score (0.0–1.0) |
entities | Extracted entities with resolved database references |
suggested_slots | Available time slots matching the expressed criteria |
WhatsApp integration
DELPHOS connects to WhatsApp through a self-hosted Evolution API instance, enabling patients to send voice messages and receive scheduling confirmations directly in their WhatsApp conversations.
Webhook endpoints
Evolution API delivers incoming messages and status updates to DELPHOS via webhooks.
Verification handshake:
GET /v1/whatsapp/webhookUsed by Evolution API during initial webhook registration to verify the endpoint is reachable and authentic.
Receive events:
POST /v1/whatsapp/webhookReceives incoming messages (text, audio, interactive button replies) and delivery status updates. DELPHOS validates every request using HMAC-SHA256 signature verification before processing.
Webhook security
Every incoming webhook request must include a valid HMAC-SHA256 signature in the request headers. DELPHOS computes the expected signature using the shared secret configured during Evolution API setup and rejects any request where the signatures do not match.
Outbound messaging
DELPHOS sends messages to patients through the Evolution API using several message formats:
| Method | Description | Use case |
|---|---|---|
| Text message | Plain text content | Confirmations, reminders |
| Interactive buttons | WhatsApp native buttons (max 3) | Slot selection with few options |
| Numbered list | Text message with numbered options | Slot selection with many options |
| Audio note | Base64-encoded voice note | Voice confirmations |
All outbound messages include retry logic with exponential backoff and a maximum of 3 delivery attempts.
Rescheduling flow
The rescheduling flow is the most complete example of voice-driven scheduling, spanning the full lifecycle from initiation to confirmation. It supports both AI-assistant-initiated and API-initiated rescheduling.
Complete flow
Initiation ──> Lookup ──> Validation ──> Slot search ──> Message ──> Send | | | ┌─────────────────────────────────────────────────────┘ | v | Patient responds ──> Selection parsed ──> Confirmation ──> Execute | | | | | (webhook) text + voice atomic DB | message update v AI assistant or direct APIStep by step:
- Initiation — triggered by the AI assistant (via the scheduling bridge) or directly through the scheduling API
- Appointment lookup — fetch the existing appointment details from the database
- Patient phone validation — retrieve the patient’s WhatsApp-enabled phone number from their contact records
- Slot search — query available time slots matching the provider and appointment type
- Message formatting — generate a WhatsApp message presenting the available slots in Brazilian date format (e.g., “quinta-feira, 10/04 as 09:00”)
- Send options — deliver the message to the patient:
- 3 or fewer slots: interactive WhatsApp buttons for one-tap selection
- More than 3 slots: numbered text list for typed selection
- State tracking — persist the rescheduling state in Redis with a 24-hour TTL
- Patient response — the patient replies via WhatsApp; the webhook routes the incoming selection to the rescheduling handler
- Confirmation — send a text confirmation message and an optional voice note
- Execution — perform the atomic reschedule in the database
Two-phase confirmation
Rescheduling uses a two-phase confirmation model to prevent accidental changes:
Phase 1 — Preview (confirmed=false)
The system fetches the existing appointment, validates the request, and returns a comparison of the old appointment versus the proposed new slot. No database changes occur.
| Field | Description |
|---|---|
current_appointment | Date, time, provider of the existing booking |
proposed_appointment | Date, time, provider of the new slot |
requires_confirmation | Always true in Phase 1 |
Phase 2 — Execute (confirmed=true)
After the patient (or clinician) approves the change, the system performs the atomic reschedule: updates the database, releases the old slot, reserves the new slot, and sends a confirmation message to the patient.
Voice confirmations
After a successful rescheduling (or booking), DELPHOS can send an audio confirmation to the patient as a WhatsApp voice note.
How it works
- The AI Engine generates a natural-language confirmation message using a Receptionist persona — warm tone, natural speaking pace, clear articulation
- The text is converted to speech optimized for audio playback (short sentences, no abbreviations, spelled-out times)
- The audio is encoded as base64 and sent as a WhatsApp voice note through the Evolution API
- The patient hears the confirmation directly in their WhatsApp chat
Message format
Voice confirmations use natural speech patterns optimized for audio playback:
- Full date and time spelled out (“quinta-feira, dez de abril, às nove horas da manhã”)
- Provider name included (“com a Doutora Ana”)
- Friendly closing (“Caso precise de algo mais, estamos à disposição”)
State management
Voice scheduling conversations are inherently asynchronous — a patient may receive slot options and respond minutes or hours later. DELPHOS uses Redis to maintain conversation state across these interactions.
What is tracked
| Field | Description |
|---|---|
original_appointment | Details of the appointment being rescheduled |
proposed_slots | List of time slots presented to the patient |
patient_selection | The slot the patient selected (once they respond) |
confirmation_status | Whether the change has been confirmed and executed |
channel | Source channel (whatsapp, phone, web) |
created_at | Timestamp for TTL calculation |
TTL and expiration
All rescheduling state entries have a 24-hour TTL. If the patient does not respond within 24 hours, the state expires and the proposed slots are released. The patient can initiate a new rescheduling request at any time.
Security and privacy
Voice scheduling handles sensitive patient data and audio recordings. DELPHOS enforces multiple layers of protection:
Data protection (LGPD)
- In-memory audio processing — audio bytes are never written to disk, logged, or persisted. The buffer is freed immediately after transcription.
- No patient identifiers in URLs — webhook endpoints use opaque tokens; patient IDs are transmitted only in request bodies over TLS.
- Minimal data retention — conversation state expires after 24 hours via Redis TTL.
Access control
- Row-Level Security (RLS) — all database queries during entity resolution and appointment lookup are scoped to the patient’s tenant. The WhatsApp service resolves tenant context before any data access.
- API key authentication — all direct API calls require a valid
x-api-keyheader.
Webhook integrity
- HMAC-SHA256 validation — every incoming webhook request is verified against a shared secret. Requests with invalid or missing signatures are rejected.
- TLS only — webhook URLs must use HTTPS in production.
Error handling
| Scenario | HTTP status | Behavior |
|---|---|---|
| Concurrency limit reached | 429 | All voice processing slots occupied; retry with backoff |
| Invalid audio format | 400 | Unsupported file type or corrupt audio data |
| Transcription failure | 503 | Transcription Engine unavailable; retry later |
| Ambiguous request | 200 | Returns VoiceClarificationResponse with follow-up question |
| No matching slots | 200 | Returns empty slots array with apology message |
| WhatsApp disabled | 503 | WHATSAPP_ENABLED is not set to true |
| Invalid webhook signature | 401 | HMAC verification failed |
| Patient not found | 404 | Patient ID does not exist or is not visible to the tenant |
Integration examples
End-to-end: WhatsApp voice rescheduling
The following sequence shows a complete rescheduling initiated by a patient sending a WhatsApp voice message:
Patient DELPHOS WhatsApp | | | |-- sends voice message ---->| | | |-- validates signature -------->| | |<- webhook payload -------------| | | | | |-- transcribes audio | | |-- extracts intent: reschedule | | |-- looks up appointment | | |-- searches available slots | | | | | |-- sends slot options --------->| |<- receives button message -| | | | | |-- taps slot button ------->| | | |-- webhook: button reply ------>| | |<- payload --------------------| | | | | |-- executes reschedule (DB) | | |-- sends text confirmation ---->| | |-- sends voice confirmation --->| |<- receives confirmation ---| |Programmatic voice scheduling
For integrations outside WhatsApp (IVR systems, custom apps), use the voice endpoint directly:
import httpx
# Step 1: Submit audio for processingwith open("patient_call.wav", "rb") as f: response = httpx.post( "https://your-instance.delphos.app/v1/scheduling/voice", headers={"x-api-key": "YOUR_API_KEY"}, files={"audio": ("call.wav", f, "audio/wav")}, data={ "patient_id": "d4e5f6a7-b8c9-0123-defa-234567890123", "channel": "phone", }, )
result = response.json()
# Step 2: Check the response type# VoiceSchedulingResponse includes "intent" and "suggested_slots"# VoiceClarificationResponse includes "clarification" field with a follow-up questionif "intent" in result: # Scheduling response — present suggested slots to the patient for slot in result["suggested_slots"]: print(f"{slot['date']} - {slot['start_time']} with {slot['provider_name']}")elif "clarification" in result: # Ambiguous request — ask the follow-up question print(result["clarification"])