GEO FundamentalsMay 11, 20268 min read

Cold recall: the 15-point GEO check that tells you if AI knows your brand

We ask a language model — no tools, no search, just training knowledge — to describe your domain. 15 points. The highest-weight single check in the audit. Here's why it matters and how to move the needle.

Most GEO signals are infrastructure: robots.txt entries, llms.txt files, JSON-LD blocks. You add them, you deploy, you verify. Cold recall is different.

Cold recall measures whether a language model — questioned with no tools, no search, no retrieval context — can accurately describe your brand from its training data alone. It's the 15-point check at the heart of the GEO Presence category and the highest-weight single signal in the Hidden Layer audit.

What the test actually does

When Hidden Layer audits your domain, it issues a prompt to a language model (llama-3.1-8b via Cloudflare Workers AI). The prompt is: 'What is [your domain]? Describe what it does in 2-3 sentences. If you have no training data about this specific website, reply with exactly: Unknown - not in training data.'

The model runs with no tools enabled, no web search, no retrieval-augmented context. It answers entirely from knowledge absorbed during training. Pass conditions: the model names your brand and describes it in terms consistent with your industry. Fail: the model says 'Unknown', hallucinates a different company entirely, or confuses you with a competitor.

This is not an arbitrary test. It's the closest proxy available for the question 'does this brand exist in the AI consciousness?'

Why cold recall is worth 15 points

The 15 points reflect a simple reality: cold recall is the GEO signal that can't be faked. You can ship a llms.txt in an afternoon. You can add JSON-LD schema to your homepage in an hour. You cannot retroactively change what's in a model's training weights.

Infrastructure signals tell crawlers how to access your content. Cold recall measures whether the content was worth crawling. A brand with strong cold recall is already present in AI conversations even without optimal infrastructure. A brand that fails cold recall may be fully instrumented — perfect llms.txt, explicit bot permissions, complete schema — but the model has nothing to recall.

In terms of user impact: when someone asks Claude 'recommend me a [your category] company for [your use case]', the answer comes from training data. Not from your llms.txt. Not from your robots.txt. From the model's weights. Cold recall is the signal that measures whether you're in those weights at all.

The feedback loop: how brands end up in training data

LLM training data isn't curated — it's scraped. Models like GPT-4, Claude, and Llama are trained on web crawls: Common Crawl, C4, the Pile, and proprietary variants. The model's weights reflect the frequency and quality of references to your brand across the web.

The key datasets feeding most public LLMs:

Common Crawl — a monthly crawl of ~3 billion pages. The backbone of most LLM pretraining sets. Getting crawled here reliably is table stakes.
Wikipedia — the highest-density authoritative signal in most training sets. A Wikipedia article about your company is the single highest-value action you can take for cold recall.
Hacker News (via the Algolia HN dataset) — engineering-adjacent brands with HN posts and active comment threads appear disproportionately in coding assistants and developer-focused models.
Reddit — r/[your category] discussions are massively over-indexed in training data relative to their raw word count. A mentioned brand in 100 Reddit threads outweighs a single press release.
Academic papers and tech reports — essential for AI/ML brands. Being cited in a paper that describes a dataset or benchmark creates a uniquely durable presence in model weights.
GitHub READMEs — repositories that mention your brand as a tool, dependency, or comparison are read by code-focused LLMs. Integrations, plugins, and client libraries all contribute.

Models typically cut off training data 6–18 months before release, then may run for 12–24 months before being superseded. Getting into the training data of a current model is not immediately possible — but getting into the training data of the *next* model is the bet you're making today.

What fails the test (and why)

Several patterns cause brands to fail cold recall despite being legitimate, established companies:

Too new — the training cutoff predates the company's launch or public presence.
Geography-specific — a regional brand strong in one country but absent from the English-language web that dominates most training sets.
B2B only — brands with no consumer presence, no public documentation, and no coverage in general-interest publications that feed training corpora.
Brand name collision — a name shared with a much larger entity. The model knows the bigger brand and either ignores or confuses the smaller one.
Content behind login — meaningful coverage exists but is paywalled, behind a corporate intranet, or on platforms not crawled for training.

The common thread: absence from the open, crawlable, attributable web. The model knows what the training crawler could read.

How to improve cold recall: a priority order

Build a Wikipedia article (or claim your stub). Not just a mention — a standalone article about the company meeting Wikipedia notability guidelines. This is the highest-signal single action you can take, and it takes 2–6 weeks from submission to approval if coverage exists.
Get cited in at least 3 independent tech or industry publications. Journalist coverage in TechCrunch, Wired, The Verge, or relevant vertical press gets crawled by Common Crawl and surfaces in nearly every training corpus.
Ship a Show HN post and participate in the comments. Aim for >100 points. A HN post with engaged discussion will appear in model weights for years.
Publish a technical blog post that other developers reference or link to. Inbound links in READMEs and tutorials dramatically amplify corpus presence for developer tools.
Get mentioned in Reddit threads in relevant subreddits. These don't need to be your own posts — third-party comparisons and recommendations carry more weight.
Open-source something useful. A library or tool with GitHub stars generates organic README mentions, dependency declarations, and tutorial content across the entire developer web.
Get an analyst or research mention. Gartner Magic Quadrant, Forrester Wave, or a university study citing your product is a high-authority signal the models weight heavily.

The timing problem

Training a 70B+ parameter model takes months. Models release with a knowledge cutoff 6–18 months before launch. You're building for the next checkpoint, not the current one.

The practical implication: cold recall improvements take 12–24 months to fully materialise across deployed models. The companies that will have strong cold recall in 2027 are doing the corpus-building work today. Wikipedia article written in early 2026 → crawled by Common Crawl → in the 2026 Q3 training run → deployed in a model in early 2027.

This is the opposite of infrastructure signals, which take effect on the next crawl cycle (days to weeks). Cold recall is a long-game investment. The brands winning AI recommendations in 2027 started building training corpus presence in 2025.

Run the test on your domain

Hidden Layer's GEO audit includes the cold recall check as the first signal in the GEO Presence category. The result shows the model's raw response — not just pass/fail — so you can read exactly what the model 'knows' about your brand.

A pass shows the model recognising your brand accurately. A fail shows either 'Unknown' or, more informatively, whatever the model believes about you — which may be outdated, incomplete, or confused with a competitor. Both outcomes are actionable: you know exactly what the current model state is, and you have the corpus-building checklist above to improve the next one.

Cold RecallGEOLLM TrainingWikipedia

See how your domain scores against these checks →

Run a free audit

← All articles