Voice-Enabled Scheduling

Patients can book, reschedule, and cancel appointments through natural voice interactions on WhatsApp audio messages and phone calls. DELPHOS transcribes the audio, extracts scheduling intent using natural language understanding, and processes the request through the scheduling pipeline — all without the patient needing to navigate menus or type messages.

How it works

The voice scheduling pipeline converts a patient’s spoken request into a structured scheduling action in five stages:

Audio validation — verify file format, duration, and size constraints
Format conversion — normalize audio to the format expected by the Transcription Engine (performed in-memory, no disk writes)
Transcription — convert speech to text via the Transcription Engine
Intent extraction — the AI Engine parses the transcribed text to identify scheduling intent and extract entities
Scheduling action — execute the resolved action (book, reschedule, cancel, or query availability)

Patient audio ──> Validation ──> Conversion ──> Transcription ──> NLU ──> Action
  (WhatsApp           |              |               |              |         |
   or phone)     format/size     in-memory       speech-to-      intent    book /
                  checks         only (LGPD)      text          + entities reschedule /
                                                                           cancel

Voice scheduling API

Process an audio recording

POST /v1/scheduling/voice

Accepts multipart/form-data containing an audio file and patient context. Returns structured scheduling data extracted from the spoken request.

Field	Type	Required	Description
`audio`	file	Yes	Audio recording (WAV, OGG, MP3, WebM)
`patient_id`	UUID	Yes	Patient submitting the request
`session_id`	UUID	No	Conversation session for multi-turn context
`channel`	string	No	Source channel: `whatsapp`, `phone`, or `web`

curl
Python

curl -X POST "https://your-instance.delphos.app/v1/scheduling/voice" \
  -H "x-api-key: YOUR_API_KEY" \
  -F "audio=@recording.ogg" \
  -F "patient_id=d4e5f6a7-b8c9-0123-defa-234567890123" \
  -F "channel=whatsapp"

import httpx

with open("recording.ogg", "rb") as f:
    response = httpx.post(
        "https://your-instance.delphos.app/v1/scheduling/voice",
        headers={"x-api-key": "YOUR_API_KEY"},
        files={"audio": ("recording.ogg", f, "audio/ogg")},
        data={
            "patient_id": "d4e5f6a7-b8c9-0123-defa-234567890123",
            "channel": "whatsapp",
        },
    )
result = response.json()

Response — returns a VoiceSchedulingResponse with the transcribed text, extracted entities, and scheduling suggestions. When the audio is ambiguous, returns a VoiceClarificationResponse containing a follow-up question.

Natural language understanding

Behind both voice and text scheduling lies the natural language understanding layer. It parses Brazilian Portuguese scheduling requests and resolves entities against the clinic database.

Process a text message

POST /v1/scheduling/natural-language

Field	Type	Required	Description
`message`	string	Yes	Free-text scheduling request in Portuguese
`patient_id`	UUID	Yes	Patient context for entity resolution

curl
Python

curl -X POST "https://your-instance.delphos.app/v1/scheduling/natural-language" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Quero remarcar minha consulta com a Dra. Ana para quinta de tarde",
    "patient_id": "d4e5f6a7-b8c9-0123-defa-234567890123"
  }'

import httpx

response = httpx.post(
    "https://your-instance.delphos.app/v1/scheduling/natural-language",
    headers={"x-api-key": "YOUR_API_KEY"},
    json={
        "message": "Quero remarcar minha consulta com a Dra. Ana para quinta de tarde",
        "patient_id": "d4e5f6a7-b8c9-0123-defa-234567890123",
    },
)
result = response.json()

Supported intents

Intent	Description	Example utterance
`schedule`	Book a new appointment	”Quero marcar uma consulta com o Dr. Silva”
`reschedule`	Move an existing appointment	”Preciso remarcar minha consulta para semana que vem”
`cancel`	Cancel an existing appointment	”Cancela minha consulta de amanha”
`query`	Check available slots	”Tem horario com a Dra. Ana na sexta?”

Extracted entities

The AI Engine extracts the following entities from the patient’s message:

Entity	Examples
Provider name	”Dr. Silva”, “Dra. Ana Beatriz”
Specialty	”cardiologista”, “dermatologia”
Date expression	”amanha”, “proxima terca”, “dia 15”
Time expression	”as 10 horas”, “14:30”
Time period	”de manha”, “a tarde”, “a noite”
Appointment type	”consulta”, “retorno”, “exame”
Reason	”dor de cabeca”, “acompanhamento”

Brazilian Portuguese date parsing

The AI Engine understands a wide range of temporal expressions in Brazilian Portuguese, including:

Relative days — “hoje”, “amanha”, “depois de amanha”, “ontem”
Named weekdays — “segunda”, “proxima terca”, “quinta que vem”
Relative weeks — “semana que vem”, “daqui a duas semanas”
Specific dates — “dia 15”, “10 de abril”, “15/04”
Time periods — “de manha” (08:00-12:00), “a tarde” (12:00-18:00), “a noite” (18:00-21:00)

Entity resolution

Extracted names and specialties are matched against database records. The resolution respects Row-Level Security (RLS) so a patient can only reference providers visible to their tenant. The response includes both the raw extracted text and the resolved database UUID when a match is found.

Response — an NLResponse containing:

Field	Description
`intent`	Detected scheduling intent
`confidence`	Confidence score (0.0–1.0)
`entities`	Extracted entities with resolved database references
`suggested_slots`	Available time slots matching the expressed criteria

WhatsApp integration

DELPHOS connects to WhatsApp through a self-hosted Evolution API instance, enabling patients to send voice messages and receive scheduling confirmations directly in their WhatsApp conversations.

Webhook endpoints

Evolution API delivers incoming messages and status updates to DELPHOS via webhooks.

Verification handshake:

GET /v1/whatsapp/webhook

Used by Evolution API during initial webhook registration to verify the endpoint is reachable and authentic.

Receive events:

POST /v1/whatsapp/webhook

Receives incoming messages (text, audio, interactive button replies) and delivery status updates. DELPHOS validates every request using HMAC-SHA256 signature verification before processing.

Webhook security

Every incoming webhook request must include a valid HMAC-SHA256 signature in the request headers. DELPHOS computes the expected signature using the shared secret configured during Evolution API setup and rejects any request where the signatures do not match.

Outbound messaging

DELPHOS sends messages to patients through the Evolution API using several message formats:

Method	Description	Use case
Text message	Plain text content	Confirmations, reminders
Interactive buttons	WhatsApp native buttons (max 3)	Slot selection with few options
Numbered list	Text message with numbered options	Slot selection with many options
Audio note	Base64-encoded voice note	Voice confirmations

All outbound messages include retry logic with exponential backoff and a maximum of 3 delivery attempts.

Rescheduling flow

The rescheduling flow is the most complete example of voice-driven scheduling, spanning the full lifecycle from initiation to confirmation. It supports both AI-assistant-initiated and API-initiated rescheduling.

Complete flow

Initiation ──> Lookup ──> Validation ──> Slot search ──> Message ──> Send
     |                                                                  |
     |            ┌─────────────────────────────────────────────────────┘
     |            v
     |     Patient responds ──> Selection parsed ──> Confirmation ──> Execute
     |            |                                        |              |
     |        (webhook)                              text + voice    atomic DB
     |                                                 message       update
     v
 AI assistant
 or direct API

Step by step:

Initiation — triggered by the AI assistant (via the scheduling bridge) or directly through the scheduling API
Appointment lookup — fetch the existing appointment details from the database
Patient phone validation — retrieve the patient’s WhatsApp-enabled phone number from their contact records
Slot search — query available time slots matching the provider and appointment type
Message formatting — generate a WhatsApp message presenting the available slots in Brazilian date format (e.g., “quinta-feira, 10/04 as 09:00”)
Send options — deliver the message to the patient:
- 3 or fewer slots: interactive WhatsApp buttons for one-tap selection
- More than 3 slots: numbered text list for typed selection
State tracking — persist the rescheduling state in Redis with a 24-hour TTL
Patient response — the patient replies via WhatsApp; the webhook routes the incoming selection to the rescheduling handler
Confirmation — send a text confirmation message and an optional voice note
Execution — perform the atomic reschedule in the database

Two-phase confirmation

Rescheduling uses a two-phase confirmation model to prevent accidental changes:

Phase 1 — Preview (confirmed=false)

The system fetches the existing appointment, validates the request, and returns a comparison of the old appointment versus the proposed new slot. No database changes occur.

Field	Description
`current_appointment`	Date, time, provider of the existing booking
`proposed_appointment`	Date, time, provider of the new slot
`requires_confirmation`	Always `true` in Phase 1

Phase 2 — Execute (confirmed=true)

After the patient (or clinician) approves the change, the system performs the atomic reschedule: updates the database, releases the old slot, reserves the new slot, and sends a confirmation message to the patient.

Voice confirmations

After a successful rescheduling (or booking), DELPHOS can send an audio confirmation to the patient as a WhatsApp voice note.

How it works

The AI Engine generates a natural-language confirmation message using a Receptionist persona — warm tone, natural speaking pace, clear articulation
The text is converted to speech optimized for audio playback (short sentences, no abbreviations, spelled-out times)
The audio is encoded as base64 and sent as a WhatsApp voice note through the Evolution API
The patient hears the confirmation directly in their WhatsApp chat

Message format

Voice confirmations use natural speech patterns optimized for audio playback:

Full date and time spelled out (“quinta-feira, dez de abril, às nove horas da manhã”)
Provider name included (“com a Doutora Ana”)
Friendly closing (“Caso precise de algo mais, estamos à disposição”)

State management

Voice scheduling conversations are inherently asynchronous — a patient may receive slot options and respond minutes or hours later. DELPHOS uses Redis to maintain conversation state across these interactions.

What is tracked

Field	Description
`original_appointment`	Details of the appointment being rescheduled
`proposed_slots`	List of time slots presented to the patient
`patient_selection`	The slot the patient selected (once they respond)
`confirmation_status`	Whether the change has been confirmed and executed
`channel`	Source channel (whatsapp, phone, web)
`created_at`	Timestamp for TTL calculation

TTL and expiration

All rescheduling state entries have a 24-hour TTL. If the patient does not respond within 24 hours, the state expires and the proposed slots are released. The patient can initiate a new rescheduling request at any time.

Security and privacy

Voice scheduling handles sensitive patient data and audio recordings. DELPHOS enforces multiple layers of protection:

Data protection (LGPD)

In-memory audio processing — audio bytes are never written to disk, logged, or persisted. The buffer is freed immediately after transcription.
No patient identifiers in URLs — webhook endpoints use opaque tokens; patient IDs are transmitted only in request bodies over TLS.
Minimal data retention — conversation state expires after 24 hours via Redis TTL.

Access control

Row-Level Security (RLS) — all database queries during entity resolution and appointment lookup are scoped to the patient’s tenant. The WhatsApp service resolves tenant context before any data access.
API key authentication — all direct API calls require a valid x-api-key header.

Webhook integrity

HMAC-SHA256 validation — every incoming webhook request is verified against a shared secret. Requests with invalid or missing signatures are rejected.
TLS only — webhook URLs must use HTTPS in production.

Error handling

Scenario	HTTP status	Behavior
Concurrency limit reached	`429`	All voice processing slots occupied; retry with backoff
Invalid audio format	`400`	Unsupported file type or corrupt audio data
Transcription failure	`503`	Transcription Engine unavailable; retry later
Ambiguous request	`200`	Returns `VoiceClarificationResponse` with follow-up question
No matching slots	`200`	Returns empty slots array with apology message
WhatsApp disabled	`503`	`WHATSAPP_ENABLED` is not set to `true`
Invalid webhook signature	`401`	HMAC verification failed
Patient not found	`404`	Patient ID does not exist or is not visible to the tenant

Integration examples

End-to-end: WhatsApp voice rescheduling

The following sequence shows a complete rescheduling initiated by a patient sending a WhatsApp voice message:

Patient                      DELPHOS                          WhatsApp
   |                            |                                |
   |-- sends voice message ---->|                                |
   |                            |-- validates signature -------->|
   |                            |<- webhook payload -------------|
   |                            |                                |
   |                            |-- transcribes audio            |
   |                            |-- extracts intent: reschedule  |
   |                            |-- looks up appointment         |
   |                            |-- searches available slots     |
   |                            |                                |
   |                            |-- sends slot options --------->|
   |<- receives button message -|                                |
   |                            |                                |
   |-- taps slot button ------->|                                |
   |                            |-- webhook: button reply ------>|
   |                            |<- payload --------------------|
   |                            |                                |
   |                            |-- executes reschedule (DB)     |
   |                            |-- sends text confirmation ---->|
   |                            |-- sends voice confirmation --->|
   |<- receives confirmation ---|                                |

Programmatic voice scheduling

For integrations outside WhatsApp (IVR systems, custom apps), use the voice endpoint directly:

import httpx

# Step 1: Submit audio for processing
with open("patient_call.wav", "rb") as f:
    response = httpx.post(
        "https://your-instance.delphos.app/v1/scheduling/voice",
        headers={"x-api-key": "YOUR_API_KEY"},
        files={"audio": ("call.wav", f, "audio/wav")},
        data={
            "patient_id": "d4e5f6a7-b8c9-0123-defa-234567890123",
            "channel": "phone",
        },
    )

result = response.json()

# Step 2: Check the response type
# VoiceSchedulingResponse includes "intent" and "suggested_slots"
# VoiceClarificationResponse includes "clarification" field with a follow-up question
if "intent" in result:
    # Scheduling response — present suggested slots to the patient
    for slot in result["suggested_slots"]:
        print(f"{slot['date']} - {slot['start_time']} with {slot['provider_name']}")
elif "clarification" in result:
    # Ambiguous request — ask the follow-up question
    print(result["clarification"])