46+ checks, one grade: how we score AI readiness
A transparent look at the 8 categories, 229 base points, and why a B doesn't mean you're ready and an F doesn't mean you're invisible.
Hidden Layer runs 46+ checks across 8 categories and returns a letter grade from A to F. The methodology is transparent — here's the full rationale for what we check and why we weight it the way we do.
The 8 categories
1. Discoverability (39 points max)
Can AI systems find you at all? This covers robots.txt presence and parse-ability, sitemap.xml structure (including sub-sitemap fan-out), HTTPS enforcement, and response headers. A domain with a 403 robots.txt gets a hard fail on bot access — we treat CDN blocks as equivalent to Disallow: *.
Why 39 points? Discovery is the precondition for everything else. A site that blocks crawlers doesn't get to score on agent integration.
2. Bot Access (75 points max)
The heaviest category. We check 12 canonical AI bot UAs against robots.txt rules: GPTBot, ClaudeBot, OAI-SearchBot, ChatGPT-User, Claude-SearchBot, Claude-User, PerplexityBot, Google-Extended, Applebot-Extended, CCBot, Meta-ExternalAgent, and Bytespider. We also score the training_search_mismatch signal — a site that blocks all training bots while allowing search bots takes a penalty because the combination signals an inconsistent AI policy.
We apply RFC 9309 inheritance: bots with no explicit section inherit the User-agent: * rule. A permissive wildcard rule is a pass for all uncovered bots. We weight retrieval bots (OAI-SearchBot, ChatGPT-User, Claude-SearchBot, Claude-User, PerplexityBot) higher than training bots — the immediate commercial impact of blocking search-type bots is larger.
3. AI Discovery (15 points max)
Is llms.txt present and parseable? We check for the file at /llms.txt, validate it has a top-level H1 and at least one section, and check for a /llms-full.txt companion. 15 points reflects that llms.txt is a genuine differentiator — present on under 20% of major domains — but not yet universal infrastructure.
4. Agent Integration (13 points max)
The emerging-standards category. These specs have <1% adoption across the web but signal forward-looking infrastructure. We weight each proportionally to its maturity:
- OpenAPI spec at standard well-known paths — 3pts (OpenAPI is established; well-known path discovery is new)
- /.well-known/agent-card.json (A2A protocol) — 2pts
- /.well-known/oauth-protected-resource (RFC 9728) — 2pts
- Listed in Smithery MCP registry — 2pts
- /.well-known/mcp.json (informal MCP card) — 1pt
- /.well-known/api catalog — 1pt
- /.well-known/http-message-signatures-directory (Web Bot Auth draft) — 1pt
- /.well-known/agent-skills/index.json — 1pt
5. AI Visibility (34 points max)
Content legibility and identity signals. This category measures how well AI systems can extract meaning from your site:
- JSON-LD structured data (Schema.org) — 8pts: the primary machine-readable identity signal
- Open Graph meta tags — 5pts: used for content preview and entity extraction
- sameAs entity linking — 5pts: links your domain to Wikidata, Wikipedia, LinkedIn in JSON-LD
- Content efficiency (text-to-HTML ratio) — 5pts: JS-heavy sites penalised for AI legibility
- Content-Signal directive (robots.txt or X-Robots-Tag) — 3pts
- Speakable Schema.org markup — 2pts
- /pricing.md machine-readable pricing — 2pts
- Agent-mode view (non-HTML response to AI UA) — 2pts
- Markdown content negotiation (Accept: text/markdown) — 2pts
6. GEO Presence (46 points max)
The GEO-specific category — measuring whether AI systems actually know and recommend your brand. This is what distinguishes Hidden Layer from infrastructure-only audits.
- LLM cold recall — 15pts: we probe a language model with no tools or context and ask it to describe your domain. Pass = model recognises and accurately describes you.
- LLM category share of voice — 10pts: we ask the model to list the top 10 brands in your industry. Pass = your brand appears.
- Wikipedia / Wikidata presence — 8pts: the strongest predictor of LLM citation accuracy in our dataset.
- HN mentions (Algolia HN API) — 5pts: HN is a high-weight LLM training corpus source.
- Brand-name search (DuckDuckGo instant) — 5pts: brand search returns your domain.
- Reddit mentions — 3pts: community corpus presence.
7. Commerce (2 points max)
Payment-pointer and x402 protocol presence. Very early-stage — most sites score 0 here. This will expand as AI-native payment protocols mature.
8. Product Pages (variable)
For domains with e-commerce product pages, we auto-discover product URLs from the sitemap and audit up to 10 pages for Schema.org product schema (Product, ProductGroup, IndividualProduct), completeness of offers/price/availability, image and brand presence, and aggregate ratings. Scored as a percentage of observed product page completeness; weight adjusts to domain size.
The grade scale
| Grade | Score | What it means |
|---|---|---|
| A | 90–100% | AI-optimised: llms.txt, strong discoverability, explicit bot policies, schema complete |
| B | 75–89% | AI-friendly: bots allowed, llms.txt present, some GEO signals in place |
| C | 60–74% | AI-accessible: core discoverability works, higher-order signals missing |
| D | 45–59% | AI-limited: significant blocks or gaps, agents struggle to get accurate information |
| F | 0–44% | AI-inaccessible: CDN blocks, critical failures, or deliberate AI exclusion |
Common misreads
"A D score means AI can't find us." Not necessarily. A D often means you're accessible but haven't implemented bot-access rules or llms.txt explicitly. Agents can still reach you; they have less policy certainty and fewer curated signals to work from.
"An A means we're done." The spec is evolving. An A today is passing the current 46+ checks. Weights will shift as adoption rises — A2A and OAuth resource metadata are currently at 1–2pts because <1% of sites implement them. When 10% do, the weight goes up.
"Low score is about content quality." We don't score content. We score discoverability, access signals, and LLM-observable presence. A site with brilliant content behind a CDN firewall scores the same as a site with no content behind the same firewall.
"The LLM check is subjective." The GEO Presence checks use a deterministic probe with a fixed model (llama-3.1-8b via CF Workers AI) and structured parsing. The same domain at the same time should return the same result. The model is updated by CF — results may shift when the model checkpoint updates.
What we don't check (yet)
Render gap analysis requires a headless browser — measuring what JS-rendered content looks like to AI crawlers vs what the raw HTTP response contains. Not free at scale, so it's not scored. It's the biggest single gap in the current methodology.
See how your domain scores against these checks →
Run a free audit