State of MCP Security 2026: We Scanned 15,923 AI Tools. Here's What We Found.

SpiderRating Research·March 22, 2026·8 min read

MCPSecurityResearchAI ToolsOpenClawSkillsVulnerability

> TL;DR: We scanned every publicly available MCP server and OpenClaw skill — 15,923 in total. 36% of MCP servers scored F (failing). 42 skills confirmed malicious (0.4%), with 552 initially flagged. Token leakage is the #1 vulnerability, found in 757 servers. Only 2% earned a B grade or higher.

---

The Dataset

SpiderRating analyzed 15,923 AI tools across two ecosystems: - 5,725 MCP servers (Model Context Protocol — the standard for connecting AI agents to external tools) - 10,198 OpenClaw/ClawHub skills (agent behavior definitions for Claude, Cursor, Windsurf)

Each tool was rated on three dimensions: - Description Quality (0-10): Can an AI agent understand what the tool does? - Security (0-10): Does the tool have exploitable vulnerabilities? - Metadata (0-10): Documentation, licensing, versioning

Combined into a SpiderScore (0-10) and letter grade (A-F).

This is the largest independent security analysis of the MCP/AI tool ecosystem to date.

---

Key Findings

1. Most AI Tools Are Mediocre — Only 2% Score B or Higher

Grade	MCP Servers	Skills	What It Means
A (≥9.0)	0 (0%)	0 (0%)	No tool meets "exemplary" standards
B (7.0-8.9)	116 (2%)	95 (1%)	Production-ready with good practices
C (5.0-6.9)	1,995 (35%)	9,050 (89%)	Adequate but room for improvement
D (3.0-4.9)	1,546 (27%)	1,052 (10%)	Significant quality/security gaps
F (<3.0)	2,068 (36%)	1 (0%)	Failing — serious issues

Zero tools scored A. The ceiling for MCP servers is 8.5/10; for skills it's 7.5/10.

MCP servers have a bimodal distribution: you're either decent (C) or terrible (F). Skills cluster in the middle (89% C-grade).

2. Token Leakage Is the #1 Vulnerability

We found 32,691 security findings across the ecosystem.

Top 10 Vulnerabilities in MCP Servers:

Rank	Vulnerability	Servers Affected	Findings
1	Token Leakage	757 (13%)	6,632
2	Command Injection (child_process)	269 (5%)	1,007
3	SQL Injection	105 (2%)	787
4	Path Traversal	244 (4%)	761
5	Prototype Pollution	145 (3%)	489
6	Hardcoded Credentials	163 (3%)	389
7	Secret Leakage (metadata)	114 (2%)	376
8	Command Injection (os/subprocess)	112 (2%)	263
9	Path Traversal (TypeScript)	169 (3%)	492
10	Timing Attack	4 (0.07%)	9

Token leakage alone accounts for 20% of all findings. API keys, auth tokens, and secrets are being exposed through MCP tool outputs, logged to files, or included in error messages.

3. 36% of MCP Servers Score F

More than a third of MCP servers are fundamentally unsafe: - Average MCP score: 4.11/10 (between D and C) - Average skill score: 5.91/10 (solid C)

Why MCP servers score worse: - Description quality crisis: avg 3.13/10 — most servers don't tell AI agents what their tools do, when to use them, or what parameters mean - Many are proof-of-concept or abandoned projects with no documentation

4. 552 Skills Flagged, 42 Confirmed Malicious

We used a two-pass security analysis: 1. Automated Threat Scanner — pattern matching for known malicious behaviors 2. LLM Verification — Claude Haiku reviews each finding to distinguish "security tool describing attacks" from "malicious skill executing attacks"

Results: - 552 skills flagged with critical security issues - 42 confirmed malicious after LLM verification (0.4% of ecosystem) - Common attack patterns: prompt injection override, invisible Unicode characters, credential exfiltration - 97% of automated "critical" findings were false positives — mostly legitimate security tools whose descriptions triggered keyword-based detection

5. The Description Quality Crisis

AI agents can only use tools they understand. Our Description Quality score measures whether a tool's description tells the AI: - What the tool does (action verb) - When to use it (scenario trigger) - What parameters mean (param docs) - What errors to expect (error guidance)

Signal	Coverage
Has action verb	~60%
Has scenario trigger	~3%
Has param documentation	~45%
Has error guidance	~8%

98% of tools lack a scenario trigger — they don't tell the AI *when* to use them. This means AI agents frequently choose the wrong tool, leading to failures users blame on "AI being dumb" when the real problem is tool documentation.

---

Scoring Methodology

SpiderRating uses a three-layer scoring model:

Overall = Description × 0.45 + Security × 0.35 + Metadata × 0.20

MCP Servers: Description (38%) + Security (34%) + Metadata (28%) Skills: Description (45%) + Security (35%) + Metadata (20%)

Description: Tool description quality, parameter docs, error guidance, disambiguation
Security: Static analysis (Semgrep taint + regex), supply chain checks, runtime exposure
Metadata: README, license, version history, community signals

Hard constraints apply: certain critical issues (e.g., no tools detected, active malware indicators) force a grade cap regardless of score.

All scans are fully offline — no code is sent to external services. The scanner (spidershield) is open source under MIT.

---

What This Means for Developers

If you build MCP servers: 1. Write scenario triggers in your tool descriptions — tell AI agents when to use each tool 2. Don't log tokens — use structured error handling that strips secrets 3. Use parameterized queries — SQL injection is the #3 vulnerability 4. Add a README and license — it's 20% of your score

If you install AI tools: 1. Check the SpiderScore before installing — anything below C (5.0) has known issues 2. Be cautious with skills rated critical — 0.4% are confirmed malicious 3. Prefer tools with B grade — they've demonstrated security best practices

If you're a platform (ClawHub, Smithery, Glama): 1. Integrate trust scores at the point of installation — users need signal before they install 2. Flag malicious skills — we've identified 42 confirmed, 552 suspected 3. Require scenario triggers — it's the single biggest quality improvement you can drive

---

About This Research

This analysis was conducted by SpiderRating, an MCP ecosystem security rating platform. We maintain a continuously updated database of MCP server and skill security assessments.

Scanner: spidershield (open source, MIT)
Data: 15,923 tools, 78,849 tool descriptions, 32,691 security findings
Precision: 93.6% calibrated accuracy (validated against 12,700+ ground-truth observations)
Methodology: Three-layer scoring (description × security × metadata) with LLM-verified threat assessment

Data updated daily. Full methodology and raw data available upon request.

---

*Published: March 2026 | SpiderRating Research*

Related reads: - 98% of tools missing usage guidance — the description quality deep dive - How We Score MCP Servers — the full scoring model explained - OpenClaw evaluation: Grade B — a real-world case study - See the most secure servers in the ecosystem - Scan your own server for free

← Back to Blog