Skip to main content

Overview

Zild Voice is a real-time AI voice automation system built on the Zild Platform. It enables:
  • Inbound AI call handling
  • Outbound AI calls (if configured)
  • Dynamic conversation routing
  • Backend integration during calls
  • Human escalation

Architecture

Zild Voice follows this execution model: Call → Telephony Provider → App → Agent → Workflow → Audio Response

1. Telephony Layer

Calls arrive through:
  • SIP trunk
  • Twilio Voice
The telephony provider sends webhook events to Zild.

2. Speech Processing

During a live call:
  1. Caller speech is converted to text (STT)
  2. Text is processed by the Agent
  3. Agent generates response
  4. Response is converted to speech (TTS)
  5. Audio is streamed back to caller
This loop continues in real time.

3. Agent Layer

Voice agents use the same AI engine as Zild Assist. They can:
  • Maintain memory
  • Execute tools
  • Query databases
  • Trigger workflows
  • Transfer calls
Voice-specific instructions improve clarity and brevity.

Enterprise Capabilities

Zild Voice supports:
  • Tenant isolation
  • Role-based access
  • Escalation routing
  • Call logging
  • Transcript storage
  • Integration auditing
Voice interactions follow the same security boundaries as messaging channels.