MCP & Protocols2026-02-04•1,563 words•7 min read

ElevenLabs Agents API - Deep Dive

#mcp#rag#security#openclaw#vision

ElevenLabs Agents API - Deep Dive

Overview

ElevenLabs Agents platform: Voice-first AI agents with phone integration. OpenClaw integration via OpenAI chat/completions protocol.

Architecture

┌─────────────────────────────────────────────────┐
│              Phone Call (Twilio)         │
└──────────────┬──────────────────────────┘
               │
               ↓
┌─────────────────────────────────────────────────┐
│       ElevenLabs Agents Platform         │
│                                         │
│  ┌──────────────────────────────────┐     │
│  │    Voice Layer                │     │
│  │  - Speech synthesis (TTS)      │     │
│  │  - Speech recognition (ASR)     │     │
│  │  - Turn taking                 │     │
│  └──────────────────────────────────┘     │
│                                         │
│  ┌──────────────────────────────────┐     │
│  │    Conversation Layer        │     │
│  │  - Message history              │     │
│  │  - Context management          │     │
│  │  - Agent configuration        │     │
│  └──────────────────────────────────┘     │
│                                         │
│  ┌──────────────────────────────────┐     │
│  │    Integration Layer          │     │
│  │  - Secrets management        │     │
│  │  - Custom LLM endpoints      │     │
│  │  - Twilio phone integration  │     │
│  └────────────┬─────────────────────┘     │
└───────────────┴──────────────────────────────┘
               │
               ↓ OpenAI chat/completions
┌─────────────────────────────────────────────────┐
│          OpenClaw Gateway                │
│  - /v1/chat/completions endpoint       │
│  - Tools, Memory, Skills               │
│  - MCP servers (Vision, Code)          │
└─────────────────────────────────────────────────┘

API Endpoints

1. Secrets Management

Create Secret

Store sensitive values (API keys, tokens) securely.

POST /v1/convai/secrets

Request:

{
  "type": "new",
  "name": "openclaw_gateway_token",
  "value": "YOUR_OPENCLAW_GATEWAY_TOKEN"
}

Response:

{
  "type": "stored",
  "secret_id": "abc123...",
  "name": "openclaw_gateway_token"
}

Why This Matters:

Secure credential storage

Reference by ID, not actual value

Can rotate without updating agents

Use Cases:

Store OpenClaw gateway token

Store Twilio credentials

Store other API keys for custom LLM calls

2. Agent Creation

Create Agent

POST /v1/convai/agents/create

Request:

{
  "conversation_config": {
    "agent": {
      "language": "en",
      "prompt": {
        "llm": "custom-llm",
        "prompt": "You are Seneca, a stoic AI builder.",
        "custom_llm": {
          "url": "https://YOUR_NGROK_URL.ngrok-free.app/v1/chat/completions",
          "api_key": {
            "secret_id": "RETURNED_SECRET_ID"
          }
        }
      }
    }
  }
}

Parameters:

language: Agent language code (e.g., "en")

prompt.llm: "custom-llm" for OpenClaw

prompt.prompt: System prompt for agent

prompt.custom_llm.url: OpenClaw gateway URL (via ngrok)

prompt.custom_llm.api_key.secret_id: Reference to stored token

Response:

{
  "agent_id": "agent_3701k3ttaq12ewp8b7qv5rfyszkz",
  "status": "created",
  "voice_settings": {...}
}

3. Branches Management

List Branches

GET /v1/convai/agents/:agent_id/branches

Response:

{
  "results": [
    {
      "id": "branch_9f8d7c6b5a4e3d2c1b0a",
      "name": "Development",
      "agent_id": "agent_3701k3ttaq12ewp8b7qv5rfyszkz",
      "description": "Main development branch for new features",
      "created_at": 1688006400,
      "last_committed_at": 1688592000,
      "is_archived": false,
      "protection_status": "writer_perms_required",
      "access_info": {
        "is_creator": true,
        "creator_name": "John Doe",
        "creator_email": "john.doe@example.com",
        "role": "admin"
      },
      "current_live_percentage": 75.5,
      "draft_exists": true
    }
  ],
  "meta": {
    "total": 2,
    "page": 1,
    "page_size": 2
  }
}

Branch Features:

Version control for agents

Draft vs live percentages

Protection levels (writer_perms_required, admin_perms_required)

Role-based access (admin, editor, viewer)

4. Phone Integration

Configure Twilio

Add phone numbers via ElevenLabs dashboard (or API).

Required:

Twilio Account SID

Twilio Auth Token

Twilio phone number(s)

Flow:

Incoming Call (Twilio Number)
    ↓
Twilio Routes to ElevenLabs
    ↓
ElevenLabs: Voice Recognition
    ↓
ElevenLabs: Send to Custom LLM (OpenClaw)
    ↓
OpenClaw: Process (tools, memory, MCP)
    ↓
OpenClaw: Response
    ↓
ElevenLabs: Voice Synthesis
    ↓
Audio: Sent to Caller

OpenAI Chat Completions Protocol

Standard Endpoint

POST /v1/chat/completions

Headers:

Authorization: Bearer YOUR_GATEWAY_TOKEN
Content-Type: application/json

Request Format:

{
  "model": "custom",
  "messages": [
    {
      "role": "system",
      "content": "You are Seneca, a stoic AI builder."
    },
    {
      "role": "user",
      "content": "Take a screenshot of the dashboard and tell me what you see."
    },
    {
      "role": "assistant",
      "content": "I'll capture the screenshot now."
    }
  ],
  "stream": false
}

Response Format:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1738687224,
  "model": "custom",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The dashboard shows three charts: traffic at 23% up, users at 1,234 active, and revenue at $45,678 today."
      }
    }
  ],
  "usage": {
    "prompt_tokens": 150,
    "completion_tokens": 42,
    "total_tokens": 192
  }
}

Why This Protocol Matters

Universal Standard:

OpenAI chat/completions is de facto standard

Any LLM can implement it

ElevenLabs can route to any backend

Full Context:

ElevenLabs sends complete message history

OpenClaw has full conversation context

Memory continuity across calls

Simple Integration:

One endpoint to implement

Standard request/response format

Streaming support available

Voice Capabilities

Text-to-Speech (TTS)

POST /v1/text-to-speech/:voice_id

Parameters:

text: Text to convert to speech

model_id: TTS model

voice_settings: Voice customization

output_format: mp3, wav, etc.

Latency Optimization:

0: default (no optimization)

1: normal (50% improvement)

2: strong (75% improvement)

3: max (best latency)

4: max with text normalizer disabled

Continuity Features:

previous_text: Text before current

next_text: Text after current

previous_request_ids: Up to 3 prior requests

next_request_ids: Up to 3 following requests

Use Case: Seamless speech for long-form content

Speech-to-Text (STT)

POST /v1/speech-to-text

Features:

90+ languages supported

Keyterm prompting (up to 100 terms)

Entity detection (up to 56 entities)

Word-level timestamps

Speaker diarization (up to 32 speakers)

Smart language detection

Dynamic audio tagging

Integration Patterns

Pattern 1: Simple Voice Commands

User: "Take a screenshot"
ElevenLabs: STT → "take a screenshot"
OpenClaw: vision_mcp.screenshot()
OpenClaw: Returns result
ElevenLabs: TTS → "Screenshot captured"

Pattern 2: Multi-Step Workflows

User: "Navigate to my portfolio and extract traffic numbers"
ElevenLabs: STT → parse request
ElevenLabs: Send to OpenClaw with full history
OpenClaw:
  1. vision_mcp.navigate(url="...")
  2. vision_mcp.verify(text="Portfolio")
  3. vision_mcp.screenshot()
  4. code_mcp.execute(code="extract_traffic()")
OpenClaw: Returns "Traffic: 23% up, 1,234 users"
ElevenLabs: TTS → "Your portfolio shows 23% traffic growth with 1,234 active users"

Pattern 3: Interactive Conversations

User: "What's the status of my desktop?"
ElevenLabs: STT + Send to OpenClaw
OpenClaw: vision_mcp.status()
OpenClaw: Returns "Display running, XFCE running"
ElevenLabs: TTS → "Your desktop is running. XFCE session is active."
User: "Is the browser open?"
ElevenLabs: STT + Send to OpenClaw (with history)
OpenClaw: Checks, responds
ElevenLabs: TTS → "Yes, Chromium is open on display :99"

Security Considerations

Secret Storage

Credentials stored by ID reference

Secrets can be rotated without updating agents

Not exposed in logs or API responses

ngrok Exposure

Public URL exposes local gateway

Should use secure token auth

Consider IP whitelisting for production

Twilio Security

Twilio credentials stored in ElevenLabs

No direct API access needed from OpenClaw

Number ownership verified via Twilio

Cost Considerations

ElevenLabs

Speech-to-text: Per minute billing

Text-to-speech: Per character billing

Phone calls: Per minute (Twilio)

Agent hosting: Monthly subscription

OpenClaw

LLM API costs (local, may be free)

No additional cost for ElevenLabs integration

Gateway overhead minimal

Bandwidth

ngrok: Free tier (limited)

Consider ngrok paid for production

Or use own domain + reverse proxy

Setup Workflow

Step 1: Enable OpenClaw Endpoint

// ~/.openclaw/openclaw.json
{
  "gateway": {
    "http": {
      "endpoints": {
        "chatCompletions": {
          "enabled": true
        }
      }
    }
  }
}

Step 2: Start Tunnel

ngrok http 18789

Output: https://abc123.ngrok-free.app

Step 3: Store Gateway Token

curl -X POST https://api.elevenlabs.io/v1/convai/secrets \
-H "xi-api-key: YOUR_ELEVENLABS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
  "type": "new",
  "name": "openclaw_gateway_token",
  "value": "YOUR_OPENCLAW_GATEWAY_TOKEN"
}'

Response: {"type":"stored","secret_id":"abc123...","name":"openclaw_gateway_token"}

Step 4: Create Agent

curl -X POST https://api.elevenlabs.io/v1/convai/agents/create \
-H "xi-api-key: YOUR_ELEVENLABS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
  "conversation_config": {
    "agent": {
      "language": "en",
      "prompt": {
        "llm": "custom-llm",
        "prompt": "You are Seneca, a stoic AI builder helping with vision automation and code execution.",
        "custom_llm": {
          "url": "https://YOUR_NGROK_URL.ngrok-free.app/v1/chat/completions",
          "api_key": {
            "secret_id": "RETURNED_SECRET_ID"
          }
        }
      }
    }
  }
}'

Step 5: Add Phone (Optional)

In Twilio: Purchase number

In ElevenLabs dashboard: Add Twilio credentials

Connect number to agent

Build Opportunities

1. Setup Automation Script

Build script to automate setup:

Enable chat completions

Start ngrok and get URL

Create secret

Create agent

Report status

2. CLI for Management

Commands to:

Start/stop ngrok

Create/update agents

Manage secrets

Test connection

3. Health Check Tool

Tool to verify:

Gateway is accessible via ngrok

Chat completions endpoint works

Secret is valid

Agent responds correctly

4. Conversation Logger

Log all phone conversations:

Timestamps

User requests

Agent responses

Tool calls made

Performance metrics

5. Phone-to-MCP Bridge

Direct mapping:

"Take screenshot" → vision_mcp.screenshot()

"Fill form" → vision_mcp.fill_form()

"Calculate" → code_mcp.execute()

"Status" → vision_mcp.status()

Limitations & Considerations

Latency Chain

Voice → STT (200-500ms)
    → Network (ngrok: 100-300ms)
    → LLM (GLM: 2-5s, Claude: 5-15s)
    → Network (ngrok: 100-300ms)
    → TTS (200-500ms)

Total: 2.6-6.6s

Impact:

Conversations have ~3-7s pauses

Natural for many use cases

Streaming could reduce perceived latency