MCP Deep Dive - Model Context Protocol
MCP Deep Dive - Model Context Protocol
What is MCP?
MCP = Model Context Protocol
An open standard for AI-tool integration that enables:
The Problem MCP Solves
Before MCP:
Each Agent Platform
├─ Custom tool API
├─ Custom protocol
└─ No interoperability
Result:
- Every tool needs custom integration code
- No sharing between platforms
- Reinventing the wheel constantlyAfter MCP:
MCP Standard
├─ Tool definitions (standardized schemas)
├─ Protocol (JSON-RPC 2.0)
└─ Transports (stdio, HTTP)
Any MCP Server:
├─ Works with any MCP client
├─ Platform-agnostic
└─ ComposableProtocol Architecture
Three Core Concepts
1. Tools (Actions)
A tool is something an LLM can call:
{
"name": "navigate",
"description": "Navigate to URL and verify page loaded",
"inputSchema": {
"type": "object",
"properties": {
"url": {"type": "string"},
"wait_time": {"type": "number"}
},
"required": ["url"]
}
}Key Elements:
name: Identifier for the tooldescription: What the tool does (LLM reads this)inputSchema: JSON Schema defining required/optional parametersWhy Schema?
2. Resources (Data)
A resource is something an LLM can read:
{
"name": "screenshots",
"uri": "screenshots:///",
"description": "List of all captured screenshots",
"mimeType": "application/json"
}Why Resources?
3. Prompts (Templates)
A prompt is a reusable prompt template:
{
"name": "analyze_screenshot",
"description": "Analyze a screenshot and report findings",
"arguments": [
{"name": "screenshot_path", "description": "Path to screenshot"}
]
}Why Prompts?
Transport Layer
Stdio (Primary)
┌─────────┐ stdin ┌─────────┐ stdout ┌─────────┐
│ Client │ ──────────→ │ MCP │ ──────────→ │ Client │
│ │ │ Server │ │ │
└─────────┘ └─────────┘ └─────────┘Flow:
Protocol: JSON-RPC 2.0
HTTP (Alternative)
┌─────────┐ HTTP POST ┌─────────┐ HTTP 200 ┌─────────┐
│ Client │ ─────────────→ │ MCP │ ────────────→ │ Client │
│ │ │ Server │ │ │
└─────────┘ └─────────┘ └─────────┘Use cases:
Server Lifecycle
Initialization
Client Server
│ │
│─────── initialize ───────────────>│
│ │
│<─────── serverInfo ───────────────│
│ - name: "vision-agent" │
│ - version: "1.0" │
│ - capabilities │
│ │
│─────── initialized ──────────────>│
│ │Tool Discovery
Client Server
│ │
│─────── tools/list ───────────────>│
│ │
│<─────── tools ───────────────────│
│ [ │
│ {"name": "navigate", ...}, │
│ {"name": "fill_form", ...}, │
│ ... │
│ ] │
│ │Tool Execution
Client Server
│ │
│─────── tools/call ──────────────>│
│ { │
│ "name": "navigate", │
│ "arguments": { │
│ "url": "https://..." │
│ } │
│ } │
│ │
│ (execute tool...) │
│ │
│<─────── result ──────────────────│
│ { │
│ "content": [ │
│ { │
│ "type": "text", │
│ "text": "✅ Navigated..." │
│ } │
│ ] │
│ } │
│ │Why MCP Matters
1. Composability
Before:
Agent Platform A
└─ Custom Vision Tools
└─ Custom Code Tools
└─ Custom DB Tools
└─ All tightly coupled
After (MCP):
Agent Platform
├─ Vision MCP Server (vision-agent)
├─ Code MCP Server (python-exec)
├─ DB MCP Server (postgresql)
└─ Mix and match freely2. Standardization
Before MCP:
# Platform A
def call_tool(tool_name, params):
return custom_api.call(tool_name, params)
# Platform B
def execute_tool(tool_id, config):
return custom_framework.run(tool_id, config)
# Platform C
def use_tool(tool, options):
return custom_lib.invoke(tool, options)After MCP:
# All platforms
async def call_tool(tool_name, params):
return await mcp_client.call("tools/call", {
"name": tool_name,
"arguments": params
})3. Ecosystem Effect
MCP Standard
│
├─ Vision Agent MCP Server (built)
├─ Code Execution MCP Server (to build)
├─ Database MCP Server (to build)
├─ Knowledge MCP Server (to build)
└─ ...Anyone can:
4. Decoupled Development
Server Development:
Client Development:
Tool Design Principles
1. Descriptive Names
❌ {"name": "nav"}
✅ {"name": "navigate"}LLMs understand "navigate" better than "nav"
2. Clear Descriptions
❌ {"description": "Go to URL"}
✅ {"description": "Navigate to URL and verify page loaded"}More context = better LLM decisions
3. Explicit Schemas
❌ {"properties": {"url": {}}} // No constraints
✅ {"properties": {
"url": {
"type": "string",
"description": "The URL to navigate to"
},
"wait_time": {
"type": "number",
"description": "Seconds to wait for page load",
"default": 3
}
}}4. Minimal Required Fields
{
"properties": {
"url": {"type": "string"},
"timeout": {"type": "number", "default": 30},
"retry": {"type": "number", "default": 3}
},
"required": ["url"] // Only required fields
}Fewer required fields = more flexible usage
Error Handling
JSON-RPC Error Codes
-32700 Parse error
-32600 Invalid request
-32601 Method not found
-32602 Invalid params
-32603 Internal errorServer-Side Errors
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"content": [
{
"type": "text",
"text": "❌ Failed: Element not found"
}
],
"isError": true
}
}Best Practice: Return structured errors in result
Content Types
Text Content
{
"type": "text",
"text": "✅ Successfully navigated to https://example.com"
}Image Content
{
"type": "image",
"data": "base64_encoded_image_data",
"mimeType": "image/png"
}Resource Content
{
"type": "resource",
"uri": "screenshot:///2026-02-04/test.png"
}Multiple content types can be returned in a single response:
{
"content": [
{"type": "text", "text": "Screenshot captured"},
{"type": "image", "data": "...", "mimeType": "image/png"}
]
}Batch Execution
Multiple Tool Calls
[
{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {"name": "navigate", "arguments": {"url": "https://example.com"}}
},
{
"jsonrpc": "2.0",
"id": 2,
"method": "tools/call",
"params": {"name": "screenshot", "arguments": {}}
}
]Server processes and returns batch responses.
Notifications
Server→Client (No Response)
{
"jsonrpc": "2.0",
"method": "notifications/progress",
"params": {
"progress": 0.5,
"message": "Capturing screenshot..."
}
}Use cases:
Security Considerations
1. Input Validation
2. Resource Limits
3. Permissions
4. Sandboxing
Implementation Patterns
Simple Server Pattern
async def main():
while True:
request = read_json_rpc()
response = handle_request(request)
write_json_rpc(response)Advanced Server Pattern
async def main():
async with mcp.stdio_server() as streams:
@server.tool()
async def tool_name(params):
# Implement tool
return result
await server.run()Client Pattern
async with mcp.stdio_client(server_path) as session:
await session.initialize()
result = await session.call_tool("tool_name", params)MCP in Practice
Vision Agent MCP Server
What we built:
9 Tools:
- navigate → Navigate + verify
- fill_form → Fill forms
- verify → Check text presence
- find_element → Find UI elements
- screenshot → Capture screen
- click_at → Click coordinates
- type_text → Type input
- scroll → Scroll page
- status → Check stateHow It Works Together
Scenario: Automated form testing
1. navigate → https://example.com/login
2. verify → "Login Form"
3. fill_form → {username: "test", password: "test"}
4. click_at → (512, 600) // Submit button
5. verify → "Welcome"
6. screenshot → Capture for reportAll steps executed via MCP, no custom code needed.
Next Steps
For Vision Agent MCP
For Ecosystem
Key Insight: MCP enables tool composability at the protocol level. Any agent can use any tool, anywhere, without custom integration code.
Analogy: MCP to AI tools is what USB is to devices. Universal, standard, plug-and-play.
Date: 2026-02-04
Topic: Model Context Protocol (MCP)
Related: vision-agent-mcp, VISION-AGENT-MCP.md