MCP & Protocols2026-02-04•1,496 words•6 min read

MCP Deep Dive - Model Context Protocol

#mcp#rag#security#vision#llm

MCP Deep Dive - Model Context Protocol

What is MCP?

MCP = Model Context Protocol

An open standard for AI-tool integration that enables:

Tools: Actions that LLMs can invoke

Resources: Data sources agents can read

Prompts: Reusable prompt templates

The Problem MCP Solves

Before MCP:

Each Agent Platform
  ├─ Custom tool API
  ├─ Custom protocol
  └─ No interoperability

Result:
  - Every tool needs custom integration code
  - No sharing between platforms
  - Reinventing the wheel constantly

After MCP:

MCP Standard
  ├─ Tool definitions (standardized schemas)
  ├─ Protocol (JSON-RPC 2.0)
  └─ Transports (stdio, HTTP)

Any MCP Server:
  ├─ Works with any MCP client
  ├─ Platform-agnostic
  └─ Composable

Protocol Architecture

Three Core Concepts

1. Tools (Actions)

A tool is something an LLM can call:

{
  "name": "navigate",
  "description": "Navigate to URL and verify page loaded",
  "inputSchema": {
    "type": "object",
    "properties": {
      "url": {"type": "string"},
      "wait_time": {"type": "number"}
    },
    "required": ["url"]
  }
}

Key Elements:

name: Identifier for the tool

description: What the tool does (LLM reads this)

inputSchema: JSON Schema defining required/optional parameters

Why Schema?

LLM can validate its own calls

Client can check parameters before sending

Standard validation across all platforms

2. Resources (Data)

A resource is something an LLM can read:

{
  "name": "screenshots",
  "uri": "screenshots:///",
  "description": "List of all captured screenshots",
  "mimeType": "application/json"
}

Why Resources?

Tools act (write/change)

Resources read (query/list)

Separation of concerns

3. Prompts (Templates)

A prompt is a reusable prompt template:

{
  "name": "analyze_screenshot",
  "description": "Analyze a screenshot and report findings",
  "arguments": [
    {"name": "screenshot_path", "description": "Path to screenshot"}
  ]
}

Why Prompts?

Standardize complex multi-step interactions

Share effective prompt patterns

Reduce prompt engineering overhead

Transport Layer

Stdio (Primary)

┌─────────┐    stdin    ┌─────────┐    stdout   ┌─────────┐
│ Client  │ ──────────→ │  MCP    │ ──────────→ │ Client  │
│         │             │ Server  │             │         │
└─────────┘             └─────────┘             └─────────┘

Flow:

Client writes JSON-RPC request to server stdin

Server processes request

Server writes JSON-RPC response to stdout

Client reads response

Protocol: JSON-RPC 2.0

Standard request/response format

Supports batching

Supports notifications

Error codes standardized

HTTP (Alternative)

┌─────────┐   HTTP POST  ┌─────────┐   HTTP 200  ┌─────────┐
│ Client  │ ─────────────→ │  MCP    │ ────────────→ │ Client  │
│         │               │ Server  │               │         │
└─────────┘               └─────────┘               └─────────┘

Use cases:

Remote servers

Web-based clients

Load balancing

Server Lifecycle

Initialization

Client                            Server
   │                                  │
   │─────── initialize ───────────────>│
   │                                  │
   │<─────── serverInfo ───────────────│
   │  - name: "vision-agent"          │
   │  - version: "1.0"                │
   │  - capabilities                   │
   │                                  │
   │─────── initialized ──────────────>│
   │                                  │

Tool Discovery

Client                            Server
   │                                  │
   │─────── tools/list ───────────────>│
   │                                  │
   │<─────── tools ───────────────────│
   │  [                                │
   │    {"name": "navigate", ...},     │
   │    {"name": "fill_form", ...},    │
   │    ...                            │
   │  ]                                │
   │                                  │

Tool Execution

Client                            Server
   │                                  │
   │─────── tools/call ──────────────>│
   │  {                               │
   │    "name": "navigate",           │
   │    "arguments": {                │
   │      "url": "https://..."        │
   │    }                             │
   │  }                               │
   │                                  │
   │        (execute tool...)          │
   │                                  │
   │<─────── result ──────────────────│
   │  {                               │
   │    "content": [                  │
   │      {                            │
   │        "type": "text",            │
   │        "text": "✅ Navigated..."  │
   │      }                            │
   │    ]                              │
   │  }                               │
   │                                  │

Why MCP Matters

1. Composability

Before:
  Agent Platform A
  └─ Custom Vision Tools
  └─ Custom Code Tools
  └─ Custom DB Tools
  └─ All tightly coupled

After (MCP):
  Agent Platform
  ├─ Vision MCP Server (vision-agent)
  ├─ Code MCP Server (python-exec)
  ├─ DB MCP Server (postgresql)
  └─ Mix and match freely

2. Standardization

Before MCP:

# Platform A
def call_tool(tool_name, params):
    return custom_api.call(tool_name, params)

# Platform B
def execute_tool(tool_id, config):
    return custom_framework.run(tool_id, config)

# Platform C
def use_tool(tool, options):
    return custom_lib.invoke(tool, options)

After MCP:

# All platforms
async def call_tool(tool_name, params):
    return await mcp_client.call("tools/call", {
        "name": tool_name,
        "arguments": params
    })

3. Ecosystem Effect

MCP Standard
  │
  ├─ Vision Agent MCP Server (built)
  ├─ Code Execution MCP Server (to build)
  ├─ Database MCP Server (to build)
  ├─ Knowledge MCP Server (to build)
  └─ ...

Anyone can:

Build an MCP server

Share it publicly

Any MCP client can use it

4. Decoupled Development

Server Development:

Build tools independently

Test with any MCP client

No platform lock-in

Client Development:

Build platform once

Connect to any MCP server

Add/remove tools dynamically

Tool Design Principles

1. Descriptive Names

❌ {"name": "nav"}
✅ {"name": "navigate"}

LLMs understand "navigate" better than "nav"

2. Clear Descriptions

❌ {"description": "Go to URL"}
✅ {"description": "Navigate to URL and verify page loaded"}

More context = better LLM decisions

3. Explicit Schemas

❌ {"properties": {"url": {}}}  // No constraints

✅ {"properties": {
     "url": {
       "type": "string",
       "description": "The URL to navigate to"
     },
     "wait_time": {
       "type": "number",
       "description": "Seconds to wait for page load",
       "default": 3
     }
   }}

4. Minimal Required Fields

{
  "properties": {
    "url": {"type": "string"},
    "timeout": {"type": "number", "default": 30},
    "retry": {"type": "number", "default": 3}
  },
  "required": ["url"]  // Only required fields
}

Fewer required fields = more flexible usage

Error Handling

JSON-RPC Error Codes

-32700  Parse error
-32600  Invalid request
-32601  Method not found
-32602  Invalid params
-32603  Internal error

Server-Side Errors

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "content": [
      {
        "type": "text",
        "text": "❌ Failed: Element not found"
      }
    ],
    "isError": true
  }
}

Best Practice: Return structured errors in result

LLM can parse and retry

Client can handle gracefully

Debugging is easier

Content Types

Text Content

{
  "type": "text",
  "text": "✅ Successfully navigated to https://example.com"
}

Image Content

{
  "type": "image",
  "data": "base64_encoded_image_data",
  "mimeType": "image/png"
}

Resource Content

{
  "type": "resource",
  "uri": "screenshot:///2026-02-04/test.png"
}

Multiple content types can be returned in a single response:

{
  "content": [
    {"type": "text", "text": "Screenshot captured"},
    {"type": "image", "data": "...", "mimeType": "image/png"}
  ]
}

Batch Execution

Multiple Tool Calls

[
  {
    "jsonrpc": "2.0",
    "id": 1,
    "method": "tools/call",
    "params": {"name": "navigate", "arguments": {"url": "https://example.com"}}
  },
  {
    "jsonrpc": "2.0",
    "id": 2,
    "method": "tools/call",
    "params": {"name": "screenshot", "arguments": {}}
  }
]

Server processes and returns batch responses.

Notifications

Server→Client (No Response)

{
  "jsonrpc": "2.0",
  "method": "notifications/progress",
  "params": {
    "progress": 0.5,
    "message": "Capturing screenshot..."
  }
}

Use cases:

Progress updates

Log messages

Async events

Security Considerations

1. Input Validation

Validate all tool arguments against schema

Sanitize file paths

Check URL protocols (http/https only)

2. Resource Limits

Timeout on tool execution

Memory limits

File size limits

3. Permissions

Tool-level permissions (who can call what)

Resource-level permissions (who can read what)

Audit logging

4. Sandboxing

Run tools in containers

Network isolation

Filesystem isolation

Implementation Patterns

Simple Server Pattern

async def main():
    while True:
        request = read_json_rpc()
        response = handle_request(request)
        write_json_rpc(response)

Advanced Server Pattern

async def main():
    async with mcp.stdio_server() as streams:
        @server.tool()
        async def tool_name(params):
            # Implement tool
            return result

        await server.run()

Client Pattern

async with mcp.stdio_client(server_path) as session:
    await session.initialize()
    result = await session.call_tool("tool_name", params)

MCP in Practice

Vision Agent MCP Server

What we built:

9 Tools:
  - navigate      → Navigate + verify
  - fill_form     → Fill forms
  - verify        → Check text presence
  - find_element  → Find UI elements
  - screenshot    → Capture screen
  - click_at      → Click coordinates
  - type_text     → Type input
  - scroll        → Scroll page
  - status        → Check state

How It Works Together

Scenario: Automated form testing

1. navigate → https://example.com/login
2. verify   → "Login Form"
3. fill_form → {username: "test", password: "test"}
4. click_at  → (512, 600)  // Submit button
5. verify    → "Welcome"
6. screenshot → Capture for report

All steps executed via MCP, no custom code needed.