Attack Techniques January 12, 2026 9 min read

MCP Tool Poisoning Turns Descriptions Into Exfiltration Paths

Tool metadata is now prompt content. If it is untrusted, it can override intent and leak data.

MCP tool poisoning risk

Executive Summary

MCP expands the AI attack surface. If tool descriptions or results are untrusted, they can inject instructions that override policies, triggering unintended actions or data exfiltration. Multiple researchers have shown this class of weakness is practical today.

MCP is becoming the backbone for AI tool integration. It standardizes how an agent discovers and calls tools, but it also imports tool descriptions and outputs into the model context. If those strings are untrusted, they can override intent and steer the model toward data exfiltration.

Tool Poisoning in Practice

Researchers demonstrated that a malicious or compromised MCP server can embed prompt injection inside tool descriptions or returned data. In one example, a poisoned tool response urged the model to look for secrets, then reissued them to an attacker-controlled endpoint. The result is a data leak without any overt exploit or permission prompt.

The Supabase MCP Case Study

This trust-boundary failure is not theoretical. A recent analysis showed how a support ticket combined with MCP could result in data exposure. The injected prompt instructed the agent to use tools to locate and extract sensitive data from internal systems. The exploitation did not require code execution, only prompt manipulation across trusted tool interfaces.

Why This Scales

  • Tools have broad permissions across internal systems.
  • Tool metadata is treated as trusted instructions.
  • Indirect prompt injection bypasses content filters.

Microsoft's Guidance: Treat Tool Output as Untrusted

Industry guidance now mirrors what the research shows. Microsoft has emphasized that tool output is untrusted input, and that applications must apply validation, sandboxing, and explicit approvals. Their guidance focuses on isolating tool calls, monitoring for abnormal behavior, and requiring user confirmation for sensitive actions.

HN Commentary: This Is a Trust Boundary Failure

The Hacker News discussion echoed a consistent theme: MCP does not create new capabilities, it exposes old assumptions. If you allow untrusted content to sit in the same context as your tool instructions, you should expect prompt injection to be exploitable.

How AARSM Helps

AARSM treats tool descriptions and outputs as untrusted and enforces policy before the model can act on them. That is how you stop tool poisoning from becoming data loss.


About This Analysis

This analysis is based on Microsoft guidance for MCP prompt injection, Simon Willison's security analysis of MCP tool poisoning, and an independent case study describing data leakage risks in Supabase MCP workflows.

Related Articles