Methodology

SpiderRating provides deterministic, transparent security ratings for AI tools — including MCP servers, Claude skills, and more. Same input always produces the same output — no AI black boxes, no subjective judgment.

What We Rate

MCP

MCP Servers

Model Context Protocol servers that expose tools, prompts, and resources to AI agents. Full 3-layer scoring: description quality, security analysis, and metadata health.

Skill

Claude Skills

Custom instructions and capabilities for Claude Code. Scored on description quality, malicious pattern detection (20+ rules), and community signals.

Tool

AI Tools

OpenAI plugins, function-calling tools, and other AI integrations. Coming soon — scoring model in development.

More

Connectors & Plugins

LangChain tools, browser extensions, and other connector types. The framework is designed to be extensible as new AI tool categories emerge.

3-Layer Scoring Model

Every item is scored across three independent layers, with weights adapted by type:

Description Quality

35%

How well tool descriptions communicate intent, scope, and side effects to LLMs.

Security Analysis

35%

Static analysis for 46+ security patterns: reverse shells, credential theft, prompt injection, toxic flows, and more.

Metadata Health

30%

Provenance signals: source availability, maintenance activity, community adoption, and download metrics.

MCP Servers: Description × 0.38 + Security × 0.34 + Metadata × 0.28 Skills: Description × 0.45 + Security × 0.35 + Metadata × 0.20

Scoring by Type

DimensionMCP ServersClaude Skills
Description5-dimension tool description scoring (intent, scope, side effects, capabilities, boundaries)Instruction clarity, scope definition, behavioral boundaries
Security46 static patterns (TS-E001~P002): reverse shells, C2, credential theft, code exec, exfiltration20+ malicious patterns, typosquat detection (Levenshtein), toxic flow analysis, rug pull detection
MetadataGitHub signals: stars, forks, license, commit recency, contributor countDownload count, author reputation, source availability, version history

5 Description Dimensions

DimensionWeightWhat It Measures
Intent Clarity20%Does the description start with an action verb and clearly distinguish this tool from others?
Permission Scope25%Does it define when to use the tool and what boundaries apply?
Side Effects20%Does it document error conditions and potential side effects?
Capability Disclosure20%Are parameters documented with examples and type information?
Operational Boundaries15%Overall description completeness — does it provide enough context for safe tool selection?

Security Scoring

Powered by SpiderShield static analysis engine with 46+ standardized issue codes (TS-E001 through TS-P002).

Security = 10 - (3 x critical + 2 x high + 1 x medium + 0.25 x low) + architecture_bonus
  • Architecture bonus: +0 to +2 based on code quality signals (tests, error handling)
  • Score clamped to [0, 10]
  • Score of 10.0 means "zero issues found", not "proven secure"

Skill-Specific Detection

Malicious Pattern Detection

  • 20+ rule patterns for suspicious instructions
  • Typosquat detection (Levenshtein distance ≤ 2)
  • Prompt injection / exfiltration patterns

Advanced Analysis

  • Toxic flow: data source + public sink combinations
  • Rug pull detection via SHA-256 content pinning
  • Allowlist mode for approved-only skills

Metadata Signals

Provenance (40%)

  • Has source code
  • Has license
  • Identifiable owner
  • Repo age > 180 days
  • Not archived

Maintenance (35%)

  • Recent commits
  • Has releases
  • Multiple contributors
  • Has description

Popularity (25%)

  • Stars / downloads (log scale)
  • Forks (log scale)
  • Watchers / installs (log scale)

Hard Constraints

Regardless of the calculated score, these rules enforce safety floors:

ConditionEffectApplies To
Any critical security issueGrade forced to FAll types
Known malicious skillGrade forced to FSkills
Security score < 5.0Grade capped at CAll types
No source repositoryGrade capped at DAll types

Grade Thresholds

A
9.0 - 10
B
7.0 - 8.9
C
5.0 - 6.9
D
3.0 - 4.9
F
0 - 2.9

Local vs Registry Scores

SpiderRating provides two complementary score types, similar to how Lighthouse (local) and PageSpeed Insights (registry) work:

SpiderScore (local)

Run spidershield scan locally. Uses the same scoring formula and grade thresholds. No registry data, no hard constraints beyond critical/no-tools/license.

SpiderScore (registry)

The authoritative score on this website. Includes registry verification, tiered hard constraints, and type-specific weights for skills vs MCP servers.

Both use the same open-source scoring specification for grade thresholds, dimension weights, and security formulas. Differences come from platform-level policy applied on top.

Reproducibility Guarantee

SpiderRating is fully deterministic. Given the same source code and metadata, it will always produce the same score. There is no randomness, no LLM-based scoring, and no network-dependent calculations. You can reproduce any rating by running spidershield scan <repo> locally.