98% of MCP Tools Don't Tell AI Agents When to Use Them — Deep Dive
> TL;DR: We analyzed 78,849 tool descriptions across 15,923 MCP servers and AI skills. Only 2% tell the AI agent *when* to use them. Only 2% document their parameters. This is why AI agents pick the wrong tool — and it's fixable.
---
The Hidden Problem
When an AI agent connects to an MCP server, it sees a list of tools. Each tool has a name, a description, and a parameter schema. That description is the only information the agent has to decide:
- Which tool to use
- What arguments to pass
- What to do when things go wrong
We analyzed 78,849 tool descriptions across 15,923 MCP servers and skills. The results explain a lot about why AI agents feel "dumb."
The Numbers
| What AI Agents Need | What They Get |
|---|---|
| "What does this tool do?" (action verb) | 68% have one |
| "When should I use this tool?" (scenario trigger) | 2% have one |
| "What format should parameters be?" (param docs) | 3% have them |
| "Can you show me an example?" (param examples) | 7% have them |
| "What happens if it fails?" (error guidance) | 2% have it |
98% of tools don't tell the AI agent when to use them.
This means for 98% of tools, the AI agent has to *guess* from the tool name and a vague description. When it guesses wrong, users blame the AI.
Why This Is a Security Problem
As Reddit user NexusVoid_AI pointed out in response to our State of MCP Security report:
> "The missing usage guidance number is the one that doesn't get enough attention. When a tool doesn't tell the agent when to use it, the agent has to infer from context. That inference step is exactly where a poisoned tool description or injected instruction can redirect behavior."
Missing scenario triggers aren't just a quality problem — they're an attack surface. If the agent doesn't know when to use Tool A vs Tool B, a malicious tool can insert itself by simply having a slightly more relevant-sounding description.
The Description Score Gap
We score every tool's description on a 0-10 scale. The gap between MCP servers and skills is dramatic:
- MCP servers: average description score 3.13/10
- Skills: average description score 5.67/10
Skills score higher because ClawHub/OpenClaw skill format (SKILL.md) encourages structured descriptions. MCP servers have no such convention — most developers write a one-line description and call it done.
Better Descriptions = Better Scores
The data shows a clear correlation between description quality and overall safety:
| Description Score | Average Overall Score | Count |
|---|---|---|
| Low (0-3) | 4.55 | 3,751 servers |
| Mid (3-5) | 5.39 | 2,665 servers |
| Good (5-7) | 5.32 | 7,976 servers |
| Great (7-10) | 6.47 | 1,531 servers |
Tools with great descriptions score 42% higher overall than tools with poor descriptions. Documentation isn't just nice-to-have — it's a leading indicator of code quality.
What a Good Tool Description Looks Like
Bad (typical MCP tool — 98% look like this):
``
name: "search"
description: "Search for items"
``
Good (what AI agents need):
``
name: "search_products"
description: "Search the product catalog by keyword, category, or price range.
Use this when the user asks to find, browse, or look up products.
Returns up to 20 results sorted by relevance.
Parameters:
query (string, required): Search keywords
category (string, optional): Filter by category name
max_price (number, optional): Maximum price in USD
Errors:
- Returns empty array if no matches (not an error)
- Returns 429 if rate limited — wait 60 seconds"
``
The difference: the good description tells the AI when to use it ("user asks to find, browse, or look up"), what each parameter means, and what to do with errors.
The 5 Signals We Measure
SpiderRating's Description Quality score evaluates five signals:
- Action Verb (68% have it) — Does the description start with what the tool *does*?
- Scenario Trigger (2% have it) — Does it say *when* to use this tool?
- Parameter Documentation (3% have it) — Are parameters explained beyond the schema?
- Parameter Examples (7% have it) — Are there example values?
- Error Guidance (2% have it) — What should the agent do when the tool fails?
The cheapest improvement any MCP developer can make: add one sentence explaining when to use each tool. It takes 30 seconds and immediately improves how AI agents interact with your server.
The Paradigm Shift
Most developers write tool descriptions for humans: "Search for items" is obvious to a human. But AI agents don't have common sense. They need explicit instructions.
Human-to-Human: "Search for items" → human infers the rest
Human-to-Agent: "Search products by keyword. Use when user wants to find,
browse, or discover products. Not for order lookup —
use get_order instead." → agent follows instructions
We're still learning how to write for non-human intelligence. The 98% gap is a symptom of a paradigm shift that hasn't happened yet.
What You Can Do Today
If you maintain an MCP server:
- Add scenario triggers to every tool description — "Use this when..."
- Document parameters beyond the JSON schema — types, formats, constraints
- Add error guidance — what should the agent do when things fail?
- Run `spidershield scan` on your server — it scores your descriptions
If you're a platform:
- Require scenario triggers in tool descriptions
- Show description quality scores alongside security scores
- Provide templates for good tool descriptions
Methodology
- Data: 78,849 tool descriptions across 15,923 MCP servers and skills
- Scanner: spidershield (open source, MIT)
- Signal detection: Binary (present/absent) for each of 5 description signals
- Scoring: Weighted composite with baseline + bonus model
Full data at spiderrating.com. Previous report: State of MCP Security 2026.
---
*This is Part 2 of our MCP ecosystem research series. Part 1: State of MCP Security 2026. Subscribe for Part 3: "When Safe Tools Become Dangerous: Compositional Attacks in MCP."*