Question 1

How is SpiderScore calculated?

Accepted Answer

SpiderScore is a weighted composite of three dimensions: Description Quality evaluates how well tool descriptions communicate intent and scope to AI agents, Security Analysis runs 46+ static analysis rules for vulnerabilities, and Metadata Health checks provenance, maintenance, and popularity signals. MCP servers use weights 38/34/28%, Skills use 45/35/20% (Description/Security/Metadata).

Question 2

What do the letter grades mean?

Accepted Answer

Grades map to score ranges: A (9.0-10.0), B (7.0-8.9), C (5.0-6.9), D (3.0-4.9), F (0-2.9). Hard constraints can force grade caps — for example, any critical security issue forces an F grade regardless of other scores.

Question 3

What types of AI tools does SpiderRating evaluate?

Accepted Answer

SpiderRating evaluates MCP servers (Model Context Protocol servers that expose tools to AI agents), Claude skills (custom instructions for Claude Code), and is expanding to cover OpenAI plugins, LangChain tools, and other AI integrations.

Question 4

Is SpiderRating reproducible?

Accepted Answer

Yes. SpiderRating is fully deterministic — same input always produces the same output. There is no AI-in-the-loop scoring or subjective judgment. You can reproduce any rating locally by running: pip install spidershield && spidershield scan <repo>.

Question 5

How many MCP servers has SpiderRating scanned?

Accepted Answer

As of March 2026, SpiderRating has rated 15,923 artifacts (5,725 MCP servers + 10,198 skills) with 78,849 individual tools analyzed. Zero tools achieve an A grade, and 36% of MCP servers score F.

Dimension	MCP Servers	Claude Skills
Description	5-dimension tool description scoring (intent, scope, side effects, capabilities, boundaries)	Instruction clarity, scope definition, behavioral boundaries
Security	46 static patterns (TS-E001~P002): reverse shells, C2, credential theft, code exec, exfiltration	20+ malicious patterns, typosquat detection (Levenshtein), toxic flow analysis, rug pull detection
Metadata	GitHub signals: stars, forks, license, commit recency, contributor count	Download count, author reputation, source availability, version history

Dimension	Weight	What It Measures
Intent Clarity	20%	Does the description start with an action verb and clearly distinguish this tool from others?
Permission Scope	25%	Does it define when to use the tool and what boundaries apply?
Side Effects	20%	Does it document error conditions and potential side effects?
Capability Disclosure	20%	Are parameters documented with examples and type information?
Operational Boundaries	15%	Overall description completeness — does it provide enough context for safe tool selection?

Condition	Effect	Applies To
Any critical security issue	Grade forced to F	All types
Known malicious skill	Grade forced to F	Skills
Security score < 5.0	Grade capped at C	All types
No source repository	Grade capped at D	All types

Methodology

What We Rate

MCP Servers

Claude Skills

AI Tools

Connectors & Plugins

3-Layer Scoring Model

Description Quality

Security Analysis

Metadata Health

Scoring by Type

5 Description Dimensions

Security Scoring

Skill-Specific Detection

Malicious Pattern Detection

Advanced Analysis

Metadata Signals

Provenance (40%)

Maintenance (35%)

Popularity (25%)

Hard Constraints

Grade Thresholds

Local vs Registry Scores

Reproducibility Guarantee