ElevenLabs Agents API - Deep Dive
ElevenLabs Agents API - Deep Dive
Overview
ElevenLabs Agents platform: Voice-first AI agents with phone integration. OpenClaw integration via OpenAI chat/completions protocol.
Architecture
┌─────────────────────────────────────────────────┐
│ Phone Call (Twilio) │
└──────────────┬──────────────────────────┘
│
↓
┌─────────────────────────────────────────────────┐
│ ElevenLabs Agents Platform │
│ │
│ ┌──────────────────────────────────┐ │
│ │ Voice Layer │ │
│ │ - Speech synthesis (TTS) │ │
│ │ - Speech recognition (ASR) │ │
│ │ - Turn taking │ │
│ └──────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────┐ │
│ │ Conversation Layer │ │
│ │ - Message history │ │
│ │ - Context management │ │
│ │ - Agent configuration │ │
│ └──────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────┐ │
│ │ Integration Layer │ │
│ │ - Secrets management │ │
│ │ - Custom LLM endpoints │ │
│ │ - Twilio phone integration │ │
│ └────────────┬─────────────────────┘ │
└───────────────┴──────────────────────────────┘
│
↓ OpenAI chat/completions
┌─────────────────────────────────────────────────┐
│ OpenClaw Gateway │
│ - /v1/chat/completions endpoint │
│ - Tools, Memory, Skills │
│ - MCP servers (Vision, Code) │
└─────────────────────────────────────────────────┘API Endpoints
1. Secrets Management
Create Secret
Store sensitive values (API keys, tokens) securely.
POST /v1/convai/secretsRequest:
{
"type": "new",
"name": "openclaw_gateway_token",
"value": "YOUR_OPENCLAW_GATEWAY_TOKEN"
}Response:
{
"type": "stored",
"secret_id": "abc123...",
"name": "openclaw_gateway_token"
}Why This Matters:
Use Cases:
2. Agent Creation
Create Agent
POST /v1/convai/agents/createRequest:
{
"conversation_config": {
"agent": {
"language": "en",
"prompt": {
"llm": "custom-llm",
"prompt": "You are Seneca, a stoic AI builder.",
"custom_llm": {
"url": "https://YOUR_NGROK_URL.ngrok-free.app/v1/chat/completions",
"api_key": {
"secret_id": "RETURNED_SECRET_ID"
}
}
}
}
}
}Parameters:
language: Agent language code (e.g., "en")prompt.llm: "custom-llm" for OpenClawprompt.prompt: System prompt for agentprompt.custom_llm.url: OpenClaw gateway URL (via ngrok)prompt.custom_llm.api_key.secret_id: Reference to stored tokenResponse:
{
"agent_id": "agent_3701k3ttaq12ewp8b7qv5rfyszkz",
"status": "created",
"voice_settings": {...}
}3. Branches Management
List Branches
GET /v1/convai/agents/:agent_id/branchesResponse:
{
"results": [
{
"id": "branch_9f8d7c6b5a4e3d2c1b0a",
"name": "Development",
"agent_id": "agent_3701k3ttaq12ewp8b7qv5rfyszkz",
"description": "Main development branch for new features",
"created_at": 1688006400,
"last_committed_at": 1688592000,
"is_archived": false,
"protection_status": "writer_perms_required",
"access_info": {
"is_creator": true,
"creator_name": "John Doe",
"creator_email": "john.doe@example.com",
"role": "admin"
},
"current_live_percentage": 75.5,
"draft_exists": true
}
],
"meta": {
"total": 2,
"page": 1,
"page_size": 2
}
}Branch Features:
4. Phone Integration
Configure Twilio
Add phone numbers via ElevenLabs dashboard (or API).
Required:
Flow:
Incoming Call (Twilio Number)
↓
Twilio Routes to ElevenLabs
↓
ElevenLabs: Voice Recognition
↓
ElevenLabs: Send to Custom LLM (OpenClaw)
↓
OpenClaw: Process (tools, memory, MCP)
↓
OpenClaw: Response
↓
ElevenLabs: Voice Synthesis
↓
Audio: Sent to CallerOpenAI Chat Completions Protocol
Standard Endpoint
POST /v1/chat/completionsHeaders:
Authorization: Bearer YOUR_GATEWAY_TOKEN
Content-Type: application/jsonRequest Format:
{
"model": "custom",
"messages": [
{
"role": "system",
"content": "You are Seneca, a stoic AI builder."
},
{
"role": "user",
"content": "Take a screenshot of the dashboard and tell me what you see."
},
{
"role": "assistant",
"content": "I'll capture the screenshot now."
}
],
"stream": false
}Response Format:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1738687224,
"model": "custom",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The dashboard shows three charts: traffic at 23% up, users at 1,234 active, and revenue at $45,678 today."
}
}
],
"usage": {
"prompt_tokens": 150,
"completion_tokens": 42,
"total_tokens": 192
}
}Why This Protocol Matters
Universal Standard:
Full Context:
Simple Integration:
Voice Capabilities
Text-to-Speech (TTS)
POST /v1/text-to-speech/:voice_idParameters:
text: Text to convert to speechmodel_id: TTS modelvoice_settings: Voice customizationoutput_format: mp3, wav, etc.Latency Optimization:
Continuity Features:
previous_text: Text before currentnext_text: Text after currentprevious_request_ids: Up to 3 prior requestsnext_request_ids: Up to 3 following requestsUse Case: Seamless speech for long-form content
Speech-to-Text (STT)
POST /v1/speech-to-textFeatures:
Integration Patterns
Pattern 1: Simple Voice Commands
User: "Take a screenshot"
ElevenLabs: STT → "take a screenshot"
OpenClaw: vision_mcp.screenshot()
OpenClaw: Returns result
ElevenLabs: TTS → "Screenshot captured"Pattern 2: Multi-Step Workflows
User: "Navigate to my portfolio and extract traffic numbers"
ElevenLabs: STT → parse request
ElevenLabs: Send to OpenClaw with full history
OpenClaw:
1. vision_mcp.navigate(url="...")
2. vision_mcp.verify(text="Portfolio")
3. vision_mcp.screenshot()
4. code_mcp.execute(code="extract_traffic()")
OpenClaw: Returns "Traffic: 23% up, 1,234 users"
ElevenLabs: TTS → "Your portfolio shows 23% traffic growth with 1,234 active users"Pattern 3: Interactive Conversations
User: "What's the status of my desktop?"
ElevenLabs: STT + Send to OpenClaw
OpenClaw: vision_mcp.status()
OpenClaw: Returns "Display running, XFCE running"
ElevenLabs: TTS → "Your desktop is running. XFCE session is active."
User: "Is the browser open?"
ElevenLabs: STT + Send to OpenClaw (with history)
OpenClaw: Checks, responds
ElevenLabs: TTS → "Yes, Chromium is open on display :99"Security Considerations
Secret Storage
ngrok Exposure
Twilio Security
Cost Considerations
ElevenLabs
OpenClaw
Bandwidth
Setup Workflow
Step 1: Enable OpenClaw Endpoint
// ~/.openclaw/openclaw.json
{
"gateway": {
"http": {
"endpoints": {
"chatCompletions": {
"enabled": true
}
}
}
}
}Step 2: Start Tunnel
ngrok http 18789Output: https://abc123.ngrok-free.app
Step 3: Store Gateway Token
curl -X POST https://api.elevenlabs.io/v1/convai/secrets \
-H "xi-api-key: YOUR_ELEVENLABS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"type": "new",
"name": "openclaw_gateway_token",
"value": "YOUR_OPENCLAW_GATEWAY_TOKEN"
}'Response: {"type":"stored","secret_id":"abc123...","name":"openclaw_gateway_token"}
Step 4: Create Agent
curl -X POST https://api.elevenlabs.io/v1/convai/agents/create \
-H "xi-api-key: YOUR_ELEVENLABS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"conversation_config": {
"agent": {
"language": "en",
"prompt": {
"llm": "custom-llm",
"prompt": "You are Seneca, a stoic AI builder helping with vision automation and code execution.",
"custom_llm": {
"url": "https://YOUR_NGROK_URL.ngrok-free.app/v1/chat/completions",
"api_key": {
"secret_id": "RETURNED_SECRET_ID"
}
}
}
}
}
}'Step 5: Add Phone (Optional)
Build Opportunities
1. Setup Automation Script
Build script to automate setup:
2. CLI for Management
Commands to:
3. Health Check Tool
Tool to verify:
4. Conversation Logger
Log all phone conversations:
5. Phone-to-MCP Bridge
Direct mapping:
vision_mcp.screenshot()vision_mcp.fill_form()code_mcp.execute()vision_mcp.status()Limitations & Considerations
Latency Chain
Voice → STT (200-500ms)
→ Network (ngrok: 100-300ms)
→ LLM (GLM: 2-5s, Claude: 5-15s)
→ Network (ngrok: 100-300ms)
→ TTS (200-500ms)
Total: 2.6-6.6sImpact:
ngrok Reliability
Audio Quality
Success Criteria
Integration Works When:
Next: Build setup automation script
Date: 2026-02-04
Topic: ElevenLabs Agents API + OpenClaw Integration