llms.txt: a year in, what actually works
Shopify has one. Samsung has one. Anthropic and OpenAI don't. A practical look at what llms.txt does, what format AI models actually process, and whether it's worth your time.
In September 2023, Jeremy Howard proposed llms.txt: a markdown file at /llms.txt giving AI systems a curated overview of your site's content, purpose, and key pages. The idea was simple — the same way robots.txt tells crawlers what to avoid, llms.txt tells LLMs what to know.
A year in, adoption is uneven, format is inconsistent, and the spec has evolved. Here's what we know.
What the spec actually says
The canonical spec at llmstxt.org defines a minimal structure:
- A top-level heading (# Site Name) with a brief description as blockquote
- An optional ## Description section with detailed context
- Optional sections with markdown links to key pages
- An optional /llms-full.txt companion with extended content
The file should be plain markdown, UTF-8, served at /llms.txt with Content-Type: text/plain. CORS headers are needed if AI tools fetch it cross-origin: Access-Control-Allow-Origin: *.
Who has adopted it (as of May 2026)
Shopify — Yes. 860 bytes. Clean structure with product/partner/developer sections. One of the earliest enterprise adopters.
Samsung — Yes. 4KB. Explicit AI bot rules in robots.txt plus a structured llms.txt with product family sections.
Anthropic — No. Despite being the company behind Claude, anthropic.com has no llms.txt as of this writing.
OpenAI — No. openai.com has no llms.txt either. Neither company dogfoods the signal their crawlers respect.
Does it actually affect LLM outputs?
This is the honest question. The answer is: sometimes, for retrieval-based queries, yes.
When a user asks an LLM tool like Perplexity or Claude's web search to "explain what [brand] does," the tool fetches the site. If llms.txt is present and well-structured, it's often what gets retrieved and parsed — it's small, machine-readable, and explicitly curated. The LLM answer is more accurate than if it had to parse the homepage's marketing copy.
For training-based knowledge (what the model knows without looking anything up), llms.txt has no effect. The model was trained on whatever content was crawled before cutoff.
What format works best
Based on testing with Claude, GPT-4o, and Perplexity:
- Lead with a one-sentence description as the first paragraph after the H1. This is what gets extracted as a summary.
- Use H2 sections with meaningful names — "Products", "Pricing", "Developer docs" — not "Section 1".
- Link to your most important pages with descriptive anchor text. The link text matters more than the URL.
- Keep it under 5KB. The point is curation, not documentation. If you need more, use llms-full.txt.
- Update it quarterly. A stale llms.txt pointing to deprecated pages is worse than none.
Is it worth it?
For most sites: yes, because it takes 30 minutes once and the upside is durable. The bar for AI discoverability is low enough that a well-structured llms.txt is genuinely differentiating.
For B2B SaaS: critical. Your buyers are using LLMs to research vendors. A good llms.txt that explains your ICP, pricing model, and integration story directly influences how Claude or GPT describes you to a buyer who asks "what's the best tool for X?"
See how your domain scores against these checks →
Run a free audit