98% of MCP Tools Don't Tell AI Agents When to Use Them — Deep Dive

SpiderRating Research··6 min read
MCPAI AgentsDocumentationTool DesignDeveloper Experience

> TL;DR: We analyzed 78,849 tool descriptions across 15,923 MCP servers and AI skills. Only 2% tell the AI agent *when* to use them. Only 2% document their parameters. This is why AI agents pick the wrong tool — and it's fixable.

---

The Hidden Problem

When an AI agent connects to an MCP server, it sees a list of tools. Each tool has a name, a description, and a parameter schema. That description is the only information the agent has to decide:

  • Which tool to use
  • What arguments to pass
  • What to do when things go wrong

We analyzed 78,849 tool descriptions across 15,923 MCP servers and skills. The results explain a lot about why AI agents feel "dumb."

The Numbers

What AI Agents NeedWhat They Get
"What does this tool do?" (action verb)68% have one
"When should I use this tool?" (scenario trigger)2% have one
"What format should parameters be?" (param docs)3% have them
"Can you show me an example?" (param examples)7% have them
"What happens if it fails?" (error guidance)2% have it

98% of tools don't tell the AI agent when to use them.

This means for 98% of tools, the AI agent has to *guess* from the tool name and a vague description. When it guesses wrong, users blame the AI.

Why This Is a Security Problem

As Reddit user NexusVoid_AI pointed out in response to our State of MCP Security report:

> "The missing usage guidance number is the one that doesn't get enough attention. When a tool doesn't tell the agent when to use it, the agent has to infer from context. That inference step is exactly where a poisoned tool description or injected instruction can redirect behavior."

Missing scenario triggers aren't just a quality problem — they're an attack surface. If the agent doesn't know when to use Tool A vs Tool B, a malicious tool can insert itself by simply having a slightly more relevant-sounding description.

The Description Score Gap

We score every tool's description on a 0-10 scale. The gap between MCP servers and skills is dramatic:

  • MCP servers: average description score 3.13/10
  • Skills: average description score 5.67/10

Skills score higher because ClawHub/OpenClaw skill format (SKILL.md) encourages structured descriptions. MCP servers have no such convention — most developers write a one-line description and call it done.

Better Descriptions = Better Scores

The data shows a clear correlation between description quality and overall safety:

Description ScoreAverage Overall ScoreCount
Low (0-3)4.553,751 servers
Mid (3-5)5.392,665 servers
Good (5-7)5.327,976 servers
Great (7-10)6.471,531 servers

Tools with great descriptions score 42% higher overall than tools with poor descriptions. Documentation isn't just nice-to-have — it's a leading indicator of code quality.

What a Good Tool Description Looks Like

Bad (typical MCP tool — 98% look like this): `` name: "search" description: "Search for items" ``

Good (what AI agents need): `` name: "search_products" description: "Search the product catalog by keyword, category, or price range. Use this when the user asks to find, browse, or look up products. Returns up to 20 results sorted by relevance. Parameters: query (string, required): Search keywords category (string, optional): Filter by category name max_price (number, optional): Maximum price in USD Errors: - Returns empty array if no matches (not an error) - Returns 429 if rate limited — wait 60 seconds" ``

The difference: the good description tells the AI when to use it ("user asks to find, browse, or look up"), what each parameter means, and what to do with errors.

The 5 Signals We Measure

SpiderRating's Description Quality score evaluates five signals:

  1. Action Verb (68% have it) — Does the description start with what the tool *does*?
  2. Scenario Trigger (2% have it) — Does it say *when* to use this tool?
  3. Parameter Documentation (3% have it) — Are parameters explained beyond the schema?
  4. Parameter Examples (7% have it) — Are there example values?
  5. Error Guidance (2% have it) — What should the agent do when the tool fails?

The cheapest improvement any MCP developer can make: add one sentence explaining when to use each tool. It takes 30 seconds and immediately improves how AI agents interact with your server.

The Paradigm Shift

Most developers write tool descriptions for humans: "Search for items" is obvious to a human. But AI agents don't have common sense. They need explicit instructions.

Human-to-Human: "Search for items" → human infers the rest
Human-to-Agent: "Search products by keyword. Use when user wants to find,
                 browse, or discover products. Not for order lookup —
                 use get_order instead." → agent follows instructions

We're still learning how to write for non-human intelligence. The 98% gap is a symptom of a paradigm shift that hasn't happened yet.

What You Can Do Today

If you maintain an MCP server:

  1. Add scenario triggers to every tool description — "Use this when..."
  2. Document parameters beyond the JSON schema — types, formats, constraints
  3. Add error guidance — what should the agent do when things fail?
  4. Run `spidershield scan` on your server — it scores your descriptions

If you're a platform:

  1. Require scenario triggers in tool descriptions
  2. Show description quality scores alongside security scores
  3. Provide templates for good tool descriptions

Methodology

  • Data: 78,849 tool descriptions across 15,923 MCP servers and skills
  • Scanner: spidershield (open source, MIT)
  • Signal detection: Binary (present/absent) for each of 5 description signals
  • Scoring: Weighted composite with baseline + bonus model

Full data at spiderrating.com. Previous report: State of MCP Security 2026.

---

*This is Part 2 of our MCP ecosystem research series. Part 1: State of MCP Security 2026. Subscribe for Part 3: "When Safe Tools Become Dangerous: Compositional Attacks in MCP."*