Methodology
SpiderRating provides deterministic, transparent security ratings for AI tools — including MCP servers, Claude skills, and more. Same input always produces the same output — no AI black boxes, no subjective judgment.
What We Rate
MCP Servers
Model Context Protocol servers that expose tools, prompts, and resources to AI agents. Full 3-layer scoring: description quality, security analysis, and metadata health.
Claude Skills
Custom instructions and capabilities for Claude Code. Scored on description quality, malicious pattern detection (20+ rules), and community signals.
AI Tools
OpenAI plugins, function-calling tools, and other AI integrations. Coming soon — scoring model in development.
Connectors & Plugins
LangChain tools, browser extensions, and other connector types. The framework is designed to be extensible as new AI tool categories emerge.
3-Layer Scoring Model
Every item is scored across three independent layers, with weights adapted by type:
Description Quality
35%How well tool descriptions communicate intent, scope, and side effects to LLMs.
Security Analysis
35%Static analysis for 46+ security patterns: reverse shells, credential theft, prompt injection, toxic flows, and more.
Metadata Health
30%Provenance signals: source availability, maintenance activity, community adoption, and download metrics.
Scoring by Type
| Dimension | MCP Servers | Claude Skills |
|---|---|---|
| Description | 5-dimension tool description scoring (intent, scope, side effects, capabilities, boundaries) | Instruction clarity, scope definition, behavioral boundaries |
| Security | 46 static patterns (TS-E001~P002): reverse shells, C2, credential theft, code exec, exfiltration | 20+ malicious patterns, typosquat detection (Levenshtein), toxic flow analysis, rug pull detection |
| Metadata | GitHub signals: stars, forks, license, commit recency, contributor count | Download count, author reputation, source availability, version history |
5 Description Dimensions
| Dimension | Weight | What It Measures |
|---|---|---|
| Intent Clarity | 20% | Does the description start with an action verb and clearly distinguish this tool from others? |
| Permission Scope | 25% | Does it define when to use the tool and what boundaries apply? |
| Side Effects | 20% | Does it document error conditions and potential side effects? |
| Capability Disclosure | 20% | Are parameters documented with examples and type information? |
| Operational Boundaries | 15% | Overall description completeness — does it provide enough context for safe tool selection? |
Security Scoring
Powered by SpiderShield static analysis engine with 46+ standardized issue codes (TS-E001 through TS-P002).
- Architecture bonus: +0 to +2 based on code quality signals (tests, error handling)
- Score clamped to [0, 10]
- Score of 10.0 means "zero issues found", not "proven secure"
Skill-Specific Detection
Malicious Pattern Detection
- 20+ rule patterns for suspicious instructions
- Typosquat detection (Levenshtein distance ≤ 2)
- Prompt injection / exfiltration patterns
Advanced Analysis
- Toxic flow: data source + public sink combinations
- Rug pull detection via SHA-256 content pinning
- Allowlist mode for approved-only skills
Metadata Signals
Provenance (40%)
- Has source code
- Has license
- Identifiable owner
- Repo age > 180 days
- Not archived
Maintenance (35%)
- Recent commits
- Has releases
- Multiple contributors
- Has description
Popularity (25%)
- Stars / downloads (log scale)
- Forks (log scale)
- Watchers / installs (log scale)
Hard Constraints
Regardless of the calculated score, these rules enforce safety floors:
| Condition | Effect | Applies To |
|---|---|---|
| Any critical security issue | Grade forced to F | All types |
| Known malicious skill | Grade forced to F | Skills |
| Security score < 5.0 | Grade capped at C | All types |
| No source repository | Grade capped at D | All types |
Grade Thresholds
Local vs Registry Scores
SpiderRating provides two complementary score types, similar to how Lighthouse (local) and PageSpeed Insights (registry) work:
Run spidershield scan locally. Uses the same scoring formula and grade thresholds. No registry data, no hard constraints beyond critical/no-tools/license.
The authoritative score on this website. Includes registry verification, tiered hard constraints, and type-specific weights for skills vs MCP servers.
Both use the same open-source scoring specification for grade thresholds, dimension weights, and security formulas. Differences come from platform-level policy applied on top.
Reproducibility Guarantee
SpiderRating is fully deterministic. Given the same source code and metadata, it will always produce the same score. There is no randomness, no LLM-based scoring, and no network-dependent calculations. You can reproduce any rating by running spidershield scan <repo> locally.