The rubric we use to score CLAUDE.md files.
Six axes. One weighted overall. Calibrated against six real-world cases — scores range from 14 (generic-template floor) to 78 (mature production file). The rubric, the cases, and the per-axis numbers are all public below.
The 6-axis rubric
Each axis is scored 0–10. Overall is a weighted sum out of 100. Specificity and Quirks captured are the highest-weighted axes — they're what actually make a CLAUDE.md useful to a fresh agent.
How much of the file is about this repo vs. generic AI-writing advice that could apply anywhere. Real paths, real commands, real conventions.
Hard cap at 4 if it includes "Claude is an AI assistant created by Anthropic" or similar boilerplate.
The non-obvious repo-specific gotchas a fresh agent would trip on. Build order traps, flaky tests with known workarounds, the test-runner flag for the macOS edge case.
Does it cover the categories a fresh agent needs? Build/install/test, layout, conventions, do/don't, ownership, links to deeper docs, testing strategy. Repo type matters — content repos aren't penalised for "no build commands."
Does it sound like the team that wrote it, or like an AI tool wrote it? Authentic, opinionated, specific — or templated benefit-bullets and "🚀 Getting Started" headers.
Sized for being loaded into every session. Target: 200–1500 tokens. Bloat is reference material that belongs in linked docs, history that belongs in CHANGELOG.md, or restating framework docs.
References to current tooling, not stale model names or retired SDK patterns. Hard cap at 5 if the file hardcodes a retired Claude model name as the one to use.
Calibration data
Six real-world CLAUDE.md files — five from real GitHub repos, one synthetic floor case — scored by hand and then re-scored by the auditor. The skill-form auditor matched hand scores on all six (Δ=0). Full per-axis breakdown + justifications →
| Case | Overall | One-liner |
|---|---|---|
humanlayer/humanlayer |
70 | Solid monorepo map with real commands; weakest on captured gotchas and a generic "Technical Guidelines" section. |
midudev/miduconf-website |
74 | High-specificity walkthrough but reads like a wiki; misses operational gotchas. |
shanraisshan/claude-code-best-practice |
78 | Well-organised reference, slightly over-budget on length, rich in real gotchas. |
zircote/.claude (personal global) |
67 | Sophisticated and full of real lessons, but bloated and version-locked to a retired model. |
VoltAgent/awesome-claude-code-subagents |
59 | Tight and on-topic for a thin meta-repo, but missing the quality-bar guidance maintainers need. |
| Synthetic floor (generic / stale model) | 14 | Generic marketing copy with a stale model hardcode and zero repo-specific signal — full rewrite, not edits. |
Floor 14, ceiling 78. The rubric produces a real distribution, not the clustered 6–8 you see from unanchored "score this file" prompts.
Want to fix it, not just score it?
This page scores. The Claude Code Power Pack fixes it — and four more things. Five skills, eight hooks, one opinionated status line. Calibration data ships with the pack so you can reproduce it.
- CLAUDE.md Generator (v1.0) — produces a strong, repo-grounded file. Calibration scores 80–85 on six real cases.
- CLAUDE.md Auditor (skill form) (v1.0) — same rubric as this page, runs locally. No API key, no round-trip.
- Hook Pack (v1.0) — eight curated hooks plus a status line. Off by default, kill-switched, silent on success.
- PR Reviewer (v1.0) — four-pass review with a hard silence rule. Calibrated against 10 real PRs; the silence rule is the load-bearing acceptance test.
- CI Babysitter (v1.0) — triages red CI runs into root causes. Refuses to bluff a fix; bails honestly when confidence is low. Calibrated against 9 real runs including the green-run silent gate.
14-day refund, no questions. Lifetime updates. No telemetry.
Methodology
- Each axis is scored 0–10 with justification. Overall is a weighted score out of 100 (Specificity and Quirks count most).
- Hard caps enforce floor quality: hardcoding a retired Claude model name caps Currency at 5. The phrase "Claude is an AI assistant created by Anthropic" caps Specificity at 4.
- You get three concrete fixes — not "consider adding more detail," but specific edits with example replacement text. Plus a one-sentence verdict.
The rubric is calibrated against the six cases shown above; baselines range from generic-template junk (14) to mature production files (78). The same rubric ships as a skill in the Power Pack — run it locally on your own file, no round-trip needed.