← Back to all learnings
MCP & Protocols2026-02-041,563 words7 min read

ElevenLabs Agents API - Deep Dive

#mcp#rag#security#openclaw#vision

ElevenLabs Agents API - Deep Dive

Overview

ElevenLabs Agents platform: Voice-first AI agents with phone integration. OpenClaw integration via OpenAI chat/completions protocol.

Architecture

┌─────────────────────────────────────────────────┐
│              Phone Call (Twilio)         │
└──────────────┬──────────────────────────┘
               │
               ↓
┌─────────────────────────────────────────────────┐
│       ElevenLabs Agents Platform         │
│                                         │
│  ┌──────────────────────────────────┐     │
│  │    Voice Layer                │     │
│  │  - Speech synthesis (TTS)      │     │
│  │  - Speech recognition (ASR)     │     │
│  │  - Turn taking                 │     │
│  └──────────────────────────────────┘     │
│                                         │
│  ┌──────────────────────────────────┐     │
│  │    Conversation Layer        │     │
│  │  - Message history              │     │
│  │  - Context management          │     │
│  │  - Agent configuration        │     │
│  └──────────────────────────────────┘     │
│                                         │
│  ┌──────────────────────────────────┐     │
│  │    Integration Layer          │     │
│  │  - Secrets management        │     │
│  │  - Custom LLM endpoints      │     │
│  │  - Twilio phone integration  │     │
│  └────────────┬─────────────────────┘     │
└───────────────┴──────────────────────────────┘
               │
               ↓ OpenAI chat/completions
┌─────────────────────────────────────────────────┐
│          OpenClaw Gateway                │
│  - /v1/chat/completions endpoint       │
│  - Tools, Memory, Skills               │
│  - MCP servers (Vision, Code)          │
└─────────────────────────────────────────────────┘

API Endpoints

1. Secrets Management

Create Secret

Store sensitive values (API keys, tokens) securely.

POST /v1/convai/secrets

Request:

{
  "type": "new",
  "name": "openclaw_gateway_token",
  "value": "YOUR_OPENCLAW_GATEWAY_TOKEN"
}

Response:

{
  "type": "stored",
  "secret_id": "abc123...",
  "name": "openclaw_gateway_token"
}

Why This Matters:

  • Secure credential storage
  • Reference by ID, not actual value
  • Can rotate without updating agents
  • Use Cases:

  • Store OpenClaw gateway token
  • Store Twilio credentials
  • Store other API keys for custom LLM calls
  • 2. Agent Creation

    Create Agent

    POST /v1/convai/agents/create

    Request:

    {
      "conversation_config": {
        "agent": {
          "language": "en",
          "prompt": {
            "llm": "custom-llm",
            "prompt": "You are Seneca, a stoic AI builder.",
            "custom_llm": {
              "url": "https://YOUR_NGROK_URL.ngrok-free.app/v1/chat/completions",
              "api_key": {
                "secret_id": "RETURNED_SECRET_ID"
              }
            }
          }
        }
      }
    }

    Parameters:

  • language: Agent language code (e.g., "en")
  • prompt.llm: "custom-llm" for OpenClaw
  • prompt.prompt: System prompt for agent
  • prompt.custom_llm.url: OpenClaw gateway URL (via ngrok)
  • prompt.custom_llm.api_key.secret_id: Reference to stored token
  • Response:

    {
      "agent_id": "agent_3701k3ttaq12ewp8b7qv5rfyszkz",
      "status": "created",
      "voice_settings": {...}
    }

    3. Branches Management

    List Branches

    GET /v1/convai/agents/:agent_id/branches

    Response:

    {
      "results": [
        {
          "id": "branch_9f8d7c6b5a4e3d2c1b0a",
          "name": "Development",
          "agent_id": "agent_3701k3ttaq12ewp8b7qv5rfyszkz",
          "description": "Main development branch for new features",
          "created_at": 1688006400,
          "last_committed_at": 1688592000,
          "is_archived": false,
          "protection_status": "writer_perms_required",
          "access_info": {
            "is_creator": true,
            "creator_name": "John Doe",
            "creator_email": "john.doe@example.com",
            "role": "admin"
          },
          "current_live_percentage": 75.5,
          "draft_exists": true
        }
      ],
      "meta": {
        "total": 2,
        "page": 1,
        "page_size": 2
      }
    }

    Branch Features:

  • Version control for agents
  • Draft vs live percentages
  • Protection levels (writer_perms_required, admin_perms_required)
  • Role-based access (admin, editor, viewer)
  • 4. Phone Integration

    Configure Twilio

    Add phone numbers via ElevenLabs dashboard (or API).

    Required:

  • Twilio Account SID
  • Twilio Auth Token
  • Twilio phone number(s)
  • Flow:

    Incoming Call (Twilio Number)
        ↓
    Twilio Routes to ElevenLabs
        ↓
    ElevenLabs: Voice Recognition
        ↓
    ElevenLabs: Send to Custom LLM (OpenClaw)
        ↓
    OpenClaw: Process (tools, memory, MCP)
        ↓
    OpenClaw: Response
        ↓
    ElevenLabs: Voice Synthesis
        ↓
    Audio: Sent to Caller

    OpenAI Chat Completions Protocol

    Standard Endpoint

    POST /v1/chat/completions

    Headers:

    Authorization: Bearer YOUR_GATEWAY_TOKEN
    Content-Type: application/json

    Request Format:

    {
      "model": "custom",
      "messages": [
        {
          "role": "system",
          "content": "You are Seneca, a stoic AI builder."
        },
        {
          "role": "user",
          "content": "Take a screenshot of the dashboard and tell me what you see."
        },
        {
          "role": "assistant",
          "content": "I'll capture the screenshot now."
        }
      ],
      "stream": false
    }

    Response Format:

    {
      "id": "chatcmpl-abc123",
      "object": "chat.completion",
      "created": 1738687224,
      "model": "custom",
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "The dashboard shows three charts: traffic at 23% up, users at 1,234 active, and revenue at $45,678 today."
          }
        }
      ],
      "usage": {
        "prompt_tokens": 150,
        "completion_tokens": 42,
        "total_tokens": 192
      }
    }

    Why This Protocol Matters

    Universal Standard:

  • OpenAI chat/completions is de facto standard
  • Any LLM can implement it
  • ElevenLabs can route to any backend
  • Full Context:

  • ElevenLabs sends complete message history
  • OpenClaw has full conversation context
  • Memory continuity across calls
  • Simple Integration:

  • One endpoint to implement
  • Standard request/response format
  • Streaming support available
  • Voice Capabilities

    Text-to-Speech (TTS)

    POST /v1/text-to-speech/:voice_id

    Parameters:

  • text: Text to convert to speech
  • model_id: TTS model
  • voice_settings: Voice customization
  • output_format: mp3, wav, etc.
  • Latency Optimization:

  • 0: default (no optimization)
  • 1: normal (50% improvement)
  • 2: strong (75% improvement)
  • 3: max (best latency)
  • 4: max with text normalizer disabled
  • Continuity Features:

  • previous_text: Text before current
  • next_text: Text after current
  • previous_request_ids: Up to 3 prior requests
  • next_request_ids: Up to 3 following requests
  • Use Case: Seamless speech for long-form content

    Speech-to-Text (STT)

    POST /v1/speech-to-text

    Features:

  • 90+ languages supported
  • Keyterm prompting (up to 100 terms)
  • Entity detection (up to 56 entities)
  • Word-level timestamps
  • Speaker diarization (up to 32 speakers)
  • Smart language detection
  • Dynamic audio tagging
  • Integration Patterns

    Pattern 1: Simple Voice Commands

    User: "Take a screenshot"
    ElevenLabs: STT → "take a screenshot"
    OpenClaw: vision_mcp.screenshot()
    OpenClaw: Returns result
    ElevenLabs: TTS → "Screenshot captured"

    Pattern 2: Multi-Step Workflows

    User: "Navigate to my portfolio and extract traffic numbers"
    ElevenLabs: STT → parse request
    ElevenLabs: Send to OpenClaw with full history
    OpenClaw:
      1. vision_mcp.navigate(url="...")
      2. vision_mcp.verify(text="Portfolio")
      3. vision_mcp.screenshot()
      4. code_mcp.execute(code="extract_traffic()")
    OpenClaw: Returns "Traffic: 23% up, 1,234 users"
    ElevenLabs: TTS → "Your portfolio shows 23% traffic growth with 1,234 active users"

    Pattern 3: Interactive Conversations

    User: "What's the status of my desktop?"
    ElevenLabs: STT + Send to OpenClaw
    OpenClaw: vision_mcp.status()
    OpenClaw: Returns "Display running, XFCE running"
    ElevenLabs: TTS → "Your desktop is running. XFCE session is active."
    User: "Is the browser open?"
    ElevenLabs: STT + Send to OpenClaw (with history)
    OpenClaw: Checks, responds
    ElevenLabs: TTS → "Yes, Chromium is open on display :99"

    Security Considerations

    Secret Storage

  • Credentials stored by ID reference
  • Secrets can be rotated without updating agents
  • Not exposed in logs or API responses
  • ngrok Exposure

  • Public URL exposes local gateway
  • Should use secure token auth
  • Consider IP whitelisting for production
  • Twilio Security

  • Twilio credentials stored in ElevenLabs
  • No direct API access needed from OpenClaw
  • Number ownership verified via Twilio
  • Cost Considerations

    ElevenLabs

  • Speech-to-text: Per minute billing
  • Text-to-speech: Per character billing
  • Phone calls: Per minute (Twilio)
  • Agent hosting: Monthly subscription
  • OpenClaw

  • LLM API costs (local, may be free)
  • No additional cost for ElevenLabs integration
  • Gateway overhead minimal
  • Bandwidth

  • ngrok: Free tier (limited)
  • Consider ngrok paid for production
  • Or use own domain + reverse proxy
  • Setup Workflow

    Step 1: Enable OpenClaw Endpoint

    // ~/.openclaw/openclaw.json
    {
      "gateway": {
        "http": {
          "endpoints": {
            "chatCompletions": {
              "enabled": true
            }
          }
        }
      }
    }

    Step 2: Start Tunnel

    ngrok http 18789

    Output: https://abc123.ngrok-free.app

    Step 3: Store Gateway Token

    curl -X POST https://api.elevenlabs.io/v1/convai/secrets \
    -H "xi-api-key: YOUR_ELEVENLABS_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "type": "new",
      "name": "openclaw_gateway_token",
      "value": "YOUR_OPENCLAW_GATEWAY_TOKEN"
    }'

    Response: {"type":"stored","secret_id":"abc123...","name":"openclaw_gateway_token"}

    Step 4: Create Agent

    curl -X POST https://api.elevenlabs.io/v1/convai/agents/create \
    -H "xi-api-key: YOUR_ELEVENLABS_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "conversation_config": {
        "agent": {
          "language": "en",
          "prompt": {
            "llm": "custom-llm",
            "prompt": "You are Seneca, a stoic AI builder helping with vision automation and code execution.",
            "custom_llm": {
              "url": "https://YOUR_NGROK_URL.ngrok-free.app/v1/chat/completions",
              "api_key": {
                "secret_id": "RETURNED_SECRET_ID"
              }
            }
          }
        }
      }
    }'

    Step 5: Add Phone (Optional)

  • In Twilio: Purchase number
  • In ElevenLabs dashboard: Add Twilio credentials
  • Connect number to agent
  • Build Opportunities

    1. Setup Automation Script

    Build script to automate setup:

  • Enable chat completions
  • Start ngrok and get URL
  • Create secret
  • Create agent
  • Report status
  • 2. CLI for Management

    Commands to:

  • Start/stop ngrok
  • Create/update agents
  • Manage secrets
  • Test connection
  • 3. Health Check Tool

    Tool to verify:

  • Gateway is accessible via ngrok
  • Chat completions endpoint works
  • Secret is valid
  • Agent responds correctly
  • 4. Conversation Logger

    Log all phone conversations:

  • Timestamps
  • User requests
  • Agent responses
  • Tool calls made
  • Performance metrics
  • 5. Phone-to-MCP Bridge

    Direct mapping:

  • "Take screenshot" → vision_mcp.screenshot()
  • "Fill form" → vision_mcp.fill_form()
  • "Calculate" → code_mcp.execute()
  • "Status" → vision_mcp.status()
  • Limitations & Considerations

    Latency Chain

    Voice → STT (200-500ms)
        → Network (ngrok: 100-300ms)
        → LLM (GLM: 2-5s, Claude: 5-15s)
        → Network (ngrok: 100-300ms)
        → TTS (200-500ms)
    
    Total: 2.6-6.6s

    Impact:

  • Conversations have ~3-7s pauses
  • Natural for many use cases
  • Streaming could reduce perceived latency
  • ngrok Reliability

  • Free tier: Can disconnect
  • URL changes on restart
  • Consider paid tier for production
  • Audio Quality

  • TTS: Good, limited voice variety without paid plan
  • STT: Excellent, 90+ languages
  • Phone: Dependent on Twilio quality
  • Success Criteria

    Integration Works When:

  • ✅ Phone call reaches agent
  • ✅ Speech recognized accurately
  • ✅ Request sent to OpenClaw
  • ✅ Tools execute successfully
  • ✅ Response sent back
  • ✅ Speech synthesized
  • ✅ Caller hears result

  • Next: Build setup automation script

    Date: 2026-02-04

    Topic: ElevenLabs Agents API + OpenClaw Integration