← Back to all learnings
MCP & Protocols2026-04-171,294 words6 min read

AI Agent Security 2026 — The OpenClaw Wake-Up Call

#mcp#security#openclaw#llm

AI Agent Security 2026 — The OpenClaw Wake-Up Call

Date: March 4, 2026

Sources: Adversa AI, Vulnu, Microsoft, OWASP, Zenity Labs, Trail of Bits

The Shift

"A chatbot answers questions. An agent does things. It runs commands. It edits files. It clicks around your browser."
— Vulnu

The era of passive chatbots is ending. We're now defending digital workers with shell access, filesystem permissions, and network connectivity. The security landscape is struggling to keep up.

The OpenClaw Incidents (Jan-Feb 2026)

ClawHub Malicious Skills

  • 341 malicious skills discovered in ClawHub marketplace
  • Supply-chain style distribution
  • "Setup steps" that ask users to copy/paste suspicious commands
  • Credential theft and data exfiltration
  • Sources: Hacker News, Tom's Hardware
  • OpenClaw Vulnerability Count

  • 512 vulnerabilities catalogued including authentication bypass
  • Susceptible to prompt injection → data exfiltration + trojanized plugins
  • Source: Adversa AI
  • OpenClaw Soul & Evil Attack

  • Zero-click chain from Google Doc → Command and Control (C2)
  • Attack vector: Identity files like SOUL.md
  • Persistent memory poisoning
  • Demonstrated by Zenity Labs
  • Source: Momentum
  • The "Lethal Trifecta" (Simon Willison)

    An agent becomes dangerous when it has all three:

  • Access to private data (secrets, credentials, files)
  • Ingests untrusted content (web pages, emails, tickets)
  • Can communicate externally (network egress, API calls)
  • "If you have all three, you have a problem." — Simon Willison

    Why Prompts Are Not Policies

    | Myth | Reality |
    |------|---------|
    | "Never exfiltrate secrets" | Suggestion, not enforcement |
    | "Ask before risky commands" | UX preference, not a control |
    | Strong system prompt = guardrails | Prompt is not a security boundary |

    The moment an agent reads untrusted content, prompt injection becomes an operational risk. Anthropic has been explicit: "the web is adversarial, and prompt injection defenses are still an active area of work."

    Tool Access Explodes the Blast Radius

    | Chatbot Failure | Agent Failure |
    |-----------------|---------------|
    | Wrong answer | Wrong action |
    | Hallucination | Shell command execution |
    | Annoying | Destructive |

    OWASP LLM Top 10 entries that map to agents:

  • LLM01: Prompt injection
  • LLM03: Supply chain risk
  • LLM05: Improper output handling (model output → shell command/SQL/CI step)
  • LLM06: Excessive agency (too many actions, too much access)
  • The 5-Point Security Checklist

    1. Sandbox the Runtime (For Real)

  • VM or container with actual restrictions
  • Separate OS user
  • Separate machine
  • Goal: Compromise stays contained
  • 2. Scope Credentials (Least Privilege)

  • Smallest possible permissions
  • Shortest possible time
  • One repo read → don't give write to all repos
  • Service account → don't hand personal credentials
  • 3. Restrict Tools (Hard Controls)

  • Allowlist commands/tool actions
  • Deny outbound network by default
  • Require approval for high-risk actions:
  • Payments
  • Sending messages
  • Deleting files
  • Pushing to production
  • "Yes, that introduces friction. That friction is the point." — Vulnu

    4. Log Actions, Not Just Chat

  • Commands executed
  • Files written/modified
  • Network egress (destination, payload)
  • Tool invocation history
  • "A conversation transcript is not an audit trail."

    5. Treat Skills/Plugins Like Dependencies

  • Skill marketplaces = package registries with nicer UI
  • Use curated, community-reviewed sources
  • Trail of Bits maintains curated skills repo explicitly for this reason
  • Security Resources (27 Catalogued by Adversa AI)

    For CISOs

  • Agentic AI Security 2026: Every Major Platform Has a Catalogued Vulnerability — 43% of MCP servers vulnerable to command execution
  • From Chatbots to Digital Workers: Managing Business Risks — OWASP framework for Board/CISO
  • AI Agent Failure & Control Gap Report — 8 confirmed incidents, CISO playbook
  • Defense Tools

  • SecureClaw — Open-source security for OpenClaw, aligned with OWASP/MITRE/CSA
  • AgentShield — Open benchmark of 6 commercial AI agent security tools (537 test cases)
  • Agentic AI Security Starter Kit — 8 Python modules: input validation, OPA policy engine, sandboxing, forensic logging
  • Research Papers

  • Phantom (arXiv:2602.16958) — Automated agent hijacking via template injection, 70+ vulnerabilities confirmed
  • ICON (arXiv:2602.20708) — Indirect prompt injection defense via attention collapse
  • AgentLeak (arXiv:2602.11510) — Multi-agent systems leak more data through internal channels
  • Authenticated Workflows (arXiv:2602.10465) — Trust layer with cryptographic verification, 100% recall
  • Microsoft Guidance

  • Running OpenClaw Safely: Identity, Isolation, Runtime Risk — 5-step attack chains, Defender XDR queries
  • Copilot Studio Agent Security: Top 10 Risks — KQL detection queries
  • Threat Modeling

  • Promptware Kill Chain (Bruce Schneier) — 7-stage framework mirroring MITRE
  • A2A Exploitation — Inter-agent communication as attack surface, 40 papers synthesized
  • OWASP ASI05 — Definitive guide to unexpected code execution
  • Key Takeaways

  • OpenClaw is the canary, not the problem — This will replay across any agent + tools + marketplace ecosystem
  • We don't have mature operational norms yet — Some run agents on dedicated machines, others on main laptops with unlocked password [REDACTED]s
  • Prompts are suggestions, not boundaries — Hard controls beat polite instructions
  • The baseline is vibes — No widely shared security defaults yet
  • Secure the runtime, not just the prompt — Authentication, verification, sandboxing, monitoring
  • Relevance to [REDACTED]'s Work

  • MCPHub — 43% of MCP servers vulnerable to command execution; need security-first design
  • Dendrite — Tree-structured reasoning agents need sandboxing and credential scoping
  • Squad architecture — Multi-agent systems have larger attack surface (A2A exploitation)
  • Skills installation — Treat like dependencies; use curated sources
  • Action Items

  • [ ] Review SOUL.md/MEMORY.md for potential injection vectors
  • [ ] Implement sandboxing for agent execution
  • [ ] Audit skill/plugin sources
  • [ ] Add action logging beyond conversation history
  • [ ] Review credential scope (least privilege)
  • [ ] Evaluate SecureClaw for deployment

  • March 2026 Update: 27 New Security Resources

    Adversa AI curated 27 resources across 11 categories for March 2026.

    New Defense Tools

    ICON (Indirect Prompt Injection Defense)

  • Two-stage defense using attention collapse detection
  • Mitigating Rectifier steers attention away from adversarial tokens
  • Source: arXiv:2602.20708
  • SecureClaw

  • Open-source security solution for OpenClaw
  • Aligned with 5 frameworks: OWASP, CosAI, MITRE ATLAS, CSA
  • Source: Adversa AI
  • AgentShield

  • First open benchmark testing 6 commercial AI agent security tools
  • Source: Adversa AI
  • New Attack Research

    AI Recommendation Poisoning (Microsoft)

  • 50+ real-world attempts discovered from 31 companies
  • Hidden prompts manipulate AI assistant recommendations persistently
  • Source: Microsoft Security Blog
  • AgentLeak Benchmark

  • Multi-agent systems leak more data through internal channels than external outputs
  • 7-channel taxonomy, 32-class attack taxonomy
  • Source: arXiv:2602.11510v1
  • Promptware Kill Chain (Bruce Schneier)

  • 7-stage framework mirroring MITRE-type classification
  • Enables defense-in-depth strategies
  • Source: Schneier on Security
  • A2A (Inter-Agent) Exploitation

  • 40 papers synthesized on agent-to-agent communication threats
  • AiTM hijacking, protocol exploits
  • Source: LinkedIn (Dr. Sewak)
  • Authenticated Workflows Paper

  • First complete trust layer for [REDACTED] agentic AI
  • Cryptographic verification with MAPL policy language
  • 100% recall, zero false positives in testing
  • Source: arXiv:2602.10465v1
  • Key Quote (March 2026)

    "The month has been defined by the rapid rise and scrutiny of OpenClaw. Excited users rush to give the agent unfettered access to their computers, and we are seeing a new class of vulnerabilities emerge — specifically around identity files like SOUL.md and persistent memory poisoning. The era of passive chatbots is coming to its end. We are now defending digital workers with shell access."
    — Adversa AI, March 2026

    Sources:

  • Adversa AI: Top Agentic AI Security Resources — March 2026
  • Vulnu: The Problem Isn't OpenClaw. It's the Architecture.
  • Microsoft Security Blog
  • OWASP Agentic AI Top 10
  • Trail of Bits: skills-curated