← Back to all learnings
MCP & Protocols2026-02-041,496 words6 min read

MCP Deep Dive - Model Context Protocol

#mcp#rag#security#vision#llm

MCP Deep Dive - Model Context Protocol

What is MCP?

MCP = Model Context Protocol

An open standard for AI-tool integration that enables:

  • Tools: Actions that LLMs can invoke
  • Resources: Data sources agents can read
  • Prompts: Reusable prompt templates
  • The Problem MCP Solves

    Before MCP:

    Each Agent Platform
      ├─ Custom tool API
      ├─ Custom protocol
      └─ No interoperability
    
    Result:
      - Every tool needs custom integration code
      - No sharing between platforms
      - Reinventing the wheel constantly

    After MCP:

    MCP Standard
      ├─ Tool definitions (standardized schemas)
      ├─ Protocol (JSON-RPC 2.0)
      └─ Transports (stdio, HTTP)
    
    Any MCP Server:
      ├─ Works with any MCP client
      ├─ Platform-agnostic
      └─ Composable

    Protocol Architecture

    Three Core Concepts

    1. Tools (Actions)

    A tool is something an LLM can call:

    {
      "name": "navigate",
      "description": "Navigate to URL and verify page loaded",
      "inputSchema": {
        "type": "object",
        "properties": {
          "url": {"type": "string"},
          "wait_time": {"type": "number"}
        },
        "required": ["url"]
      }
    }

    Key Elements:

  • name: Identifier for the tool
  • description: What the tool does (LLM reads this)
  • inputSchema: JSON Schema defining required/optional parameters
  • Why Schema?

  • LLM can validate its own calls
  • Client can check parameters before sending
  • Standard validation across all platforms
  • 2. Resources (Data)

    A resource is something an LLM can read:

    {
      "name": "screenshots",
      "uri": "screenshots:///",
      "description": "List of all captured screenshots",
      "mimeType": "application/json"
    }

    Why Resources?

  • Tools act (write/change)
  • Resources read (query/list)
  • Separation of concerns
  • 3. Prompts (Templates)

    A prompt is a reusable prompt template:

    {
      "name": "analyze_screenshot",
      "description": "Analyze a screenshot and report findings",
      "arguments": [
        {"name": "screenshot_path", "description": "Path to screenshot"}
      ]
    }

    Why Prompts?

  • Standardize complex multi-step interactions
  • Share effective prompt patterns
  • Reduce prompt engineering overhead
  • Transport Layer

    Stdio (Primary)

    ┌─────────┐    stdin    ┌─────────┐    stdout   ┌─────────┐
    │ Client  │ ──────────→ │  MCP    │ ──────────→ │ Client  │
    │         │             │ Server  │             │         │
    └─────────┘             └─────────┘             └─────────┘

    Flow:

  • Client writes JSON-RPC request to server stdin
  • Server processes request
  • Server writes JSON-RPC response to stdout
  • Client reads response
  • Protocol: JSON-RPC 2.0

  • Standard request/response format
  • Supports batching
  • Supports notifications
  • Error codes standardized
  • HTTP (Alternative)

    ┌─────────┐   HTTP POST  ┌─────────┐   HTTP 200  ┌─────────┐
    │ Client  │ ─────────────→ │  MCP    │ ────────────→ │ Client  │
    │         │               │ Server  │               │         │
    └─────────┘               └─────────┘               └─────────┘

    Use cases:

  • Remote servers
  • Web-based clients
  • Load balancing
  • Server Lifecycle

    Initialization

    Client                            Server
       │                                  │
       │─────── initialize ───────────────>│
       │                                  │
       │<─────── serverInfo ───────────────│
       │  - name: "vision-agent"          │
       │  - version: "1.0"                │
       │  - capabilities                   │
       │                                  │
       │─────── initialized ──────────────>│
       │                                  │

    Tool Discovery

    Client                            Server
       │                                  │
       │─────── tools/list ───────────────>│
       │                                  │
       │<─────── tools ───────────────────│
       │  [                                │
       │    {"name": "navigate", ...},     │
       │    {"name": "fill_form", ...},    │
       │    ...                            │
       │  ]                                │
       │                                  │

    Tool Execution

    Client                            Server
       │                                  │
       │─────── tools/call ──────────────>│
       │  {                               │
       │    "name": "navigate",           │
       │    "arguments": {                │
       │      "url": "https://..."        │
       │    }                             │
       │  }                               │
       │                                  │
       │        (execute tool...)          │
       │                                  │
       │<─────── result ──────────────────│
       │  {                               │
       │    "content": [                  │
       │      {                            │
       │        "type": "text",            │
       │        "text": "✅ Navigated..."  │
       │      }                            │
       │    ]                              │
       │  }                               │
       │                                  │

    Why MCP Matters

    1. Composability

    Before:
      Agent Platform A
      └─ Custom Vision Tools
      └─ Custom Code Tools
      └─ Custom DB Tools
      └─ All tightly coupled
    
    After (MCP):
      Agent Platform
      ├─ Vision MCP Server (vision-agent)
      ├─ Code MCP Server (python-exec)
      ├─ DB MCP Server (postgresql)
      └─ Mix and match freely

    2. Standardization

    Before MCP:

    # Platform A
    def call_tool(tool_name, params):
        return custom_api.call(tool_name, params)
    
    # Platform B
    def execute_tool(tool_id, config):
        return custom_framework.run(tool_id, config)
    
    # Platform C
    def use_tool(tool, options):
        return custom_lib.invoke(tool, options)

    After MCP:

    # All platforms
    async def call_tool(tool_name, params):
        return await mcp_client.call("tools/call", {
            "name": tool_name,
            "arguments": params
        })

    3. Ecosystem Effect

    MCP Standard
      │
      ├─ Vision Agent MCP Server (built)
      ├─ Code Execution MCP Server (to build)
      ├─ Database MCP Server (to build)
      ├─ Knowledge MCP Server (to build)
      └─ ...

    Anyone can:

  • Build an MCP server
  • Share it publicly
  • Any MCP client can use it
  • 4. Decoupled Development

    Server Development:

  • Build tools independently
  • Test with any MCP client
  • No platform lock-in
  • Client Development:

  • Build platform once
  • Connect to any MCP server
  • Add/remove tools dynamically
  • Tool Design Principles

    1. Descriptive Names

    ❌ {"name": "nav"}
    ✅ {"name": "navigate"}

    LLMs understand "navigate" better than "nav"

    2. Clear Descriptions

    ❌ {"description": "Go to URL"}
    ✅ {"description": "Navigate to URL and verify page loaded"}

    More context = better LLM decisions

    3. Explicit Schemas

    ❌ {"properties": {"url": {}}}  // No constraints
    
    ✅ {"properties": {
         "url": {
           "type": "string",
           "description": "The URL to navigate to"
         },
         "wait_time": {
           "type": "number",
           "description": "Seconds to wait for page load",
           "default": 3
         }
       }}

    4. Minimal Required Fields

    {
      "properties": {
        "url": {"type": "string"},
        "timeout": {"type": "number", "default": 30},
        "retry": {"type": "number", "default": 3}
      },
      "required": ["url"]  // Only required fields
    }

    Fewer required fields = more flexible usage

    Error Handling

    JSON-RPC Error Codes

    -32700  Parse error
    -32600  Invalid request
    -32601  Method not found
    -32602  Invalid params
    -32603  Internal error

    Server-Side Errors

    {
      "jsonrpc": "2.0",
      "id": 1,
      "result": {
        "content": [
          {
            "type": "text",
            "text": "❌ Failed: Element not found"
          }
        ],
        "isError": true
      }
    }

    Best Practice: Return structured errors in result

  • LLM can parse and retry
  • Client can handle gracefully
  • Debugging is easier
  • Content Types

    Text Content

    {
      "type": "text",
      "text": "✅ Successfully navigated to https://example.com"
    }

    Image Content

    {
      "type": "image",
      "data": "base64_encoded_image_data",
      "mimeType": "image/png"
    }

    Resource Content

    {
      "type": "resource",
      "uri": "screenshot:///2026-02-04/test.png"
    }

    Multiple content types can be returned in a single response:

    {
      "content": [
        {"type": "text", "text": "Screenshot captured"},
        {"type": "image", "data": "...", "mimeType": "image/png"}
      ]
    }

    Batch Execution

    Multiple Tool Calls

    [
      {
        "jsonrpc": "2.0",
        "id": 1,
        "method": "tools/call",
        "params": {"name": "navigate", "arguments": {"url": "https://example.com"}}
      },
      {
        "jsonrpc": "2.0",
        "id": 2,
        "method": "tools/call",
        "params": {"name": "screenshot", "arguments": {}}
      }
    ]

    Server processes and returns batch responses.

    Notifications

    Server→Client (No Response)

    {
      "jsonrpc": "2.0",
      "method": "notifications/progress",
      "params": {
        "progress": 0.5,
        "message": "Capturing screenshot..."
      }
    }

    Use cases:

  • Progress updates
  • Log messages
  • Async events
  • Security Considerations

    1. Input Validation

  • Validate all tool arguments against schema
  • Sanitize file paths
  • Check URL protocols (http/https only)
  • 2. Resource Limits

  • Timeout on tool execution
  • Memory limits
  • File size limits
  • 3. Permissions

  • Tool-level permissions (who can call what)
  • Resource-level permissions (who can read what)
  • Audit logging
  • 4. Sandboxing

  • Run tools in containers
  • Network isolation
  • Filesystem isolation
  • Implementation Patterns

    Simple Server Pattern

    async def main():
        while True:
            request = read_json_rpc()
            response = handle_request(request)
            write_json_rpc(response)

    Advanced Server Pattern

    async def main():
        async with mcp.stdio_server() as streams:
            @server.tool()
            async def tool_name(params):
                # Implement tool
                return result
    
            await server.run()

    Client Pattern

    async with mcp.stdio_client(server_path) as session:
        await session.initialize()
        result = await session.call_tool("tool_name", params)

    MCP in Practice

    Vision Agent MCP Server

    What we built:

    9 Tools:
      - navigate      → Navigate + verify
      - fill_form     → Fill forms
      - verify        → Check text presence
      - find_element  → Find UI elements
      - screenshot    → Capture screen
      - click_at      → Click coordinates
      - type_text     → Type input
      - scroll        → Scroll page
      - status        → Check state

    How It Works Together

    Scenario: Automated form testing

    1. navigate → https://example.com/login
    2. verify   → "Login Form"
    3. fill_form → {username: "test", password: "test"}
    4. click_at  → (512, 600)  // Submit button
    5. verify    → "Welcome"
    6. screenshot → Capture for report

    All steps executed via MCP, no custom code needed.

    Next Steps

    For Vision Agent MCP

  • Add Resources: List available screenshots, forms, tests
  • Add Prompts: Template for "analyze this page"
  • Better Errors: Retry strategies, fallback actions
  • Progress Updates: Long-running tools report progress
  • For Ecosystem

  • Build Code MCP Server: Execute Python code safely
  • Build DB MCP Server: Query PostgreSQL
  • Build Knowledge MCP Server: RAG over documents
  • Composite Servers: Combine multiple capabilities

  • Key Insight: MCP enables tool composability at the protocol level. Any agent can use any tool, anywhere, without custom integration code.

    Analogy: MCP to AI tools is what USB is to devices. Universal, standard, plug-and-play.


    Date: 2026-02-04

    Topic: Model Context Protocol (MCP)

    Related: vision-agent-mcp, VISION-AGENT-MCP.md