llms.txt and robots.txt: Your Store’s First Conversation with AI

Published on June 10, 20269 min read

One fashion retailer discovered they had been blocking GPTBot for months without knowing it. A single line in their robots.txt – Disallow: / for GPTBot – was making their entire catalog invisible to ChatGPT Shopping. After removing that line, their AI referral traffic went from zero to 8% of total revenue within weeks. The fix took less than thirty seconds.

This story is not unusual. As AI-powered shopping assistants become a real channel for product discovery, two small text files at the root of your website have become surprisingly high-stakes: robots.txt and the newer llms.txt. Together with your sitemap.xml, they form a three-part system that determines whether AI agents can find your store, understand what you sell, and recommend your products to shoppers.

Think of it this way. Your robots.txt is the security guard at the front door – it decides who gets in and what areas they can access. Your sitemap.xml is the building directory in the lobby – it lists every room and floor. And llms.txt is the concierge who greets visitors, explains what the business does, and guides them to exactly what they need.

Without the security guard, nobody knows the rules. Without the directory, visitors wander aimlessly. And without the concierge, AI agents have to figure everything out on their own – which they often do poorly. You need all three working together.

The Security Guard: How robots.txt Controls AI Crawler Access

The robots.txt file has been around since 1994 and its job has always been simple: tell web crawlers which parts of your site they can and cannot visit. For decades, most store owners only thought about Googlebot. But in 2025 and 2026, a new generation of AI crawlers started visiting e-commerce sites, and many of them respect robots.txt rules independently from traditional search bots.

Here are the AI crawlers that matter most for e-commerce right now. GPTBot is OpenAI's main crawler – it indexes content for ChatGPT Shopping and product recommendations. ChatGPT-User is what ChatGPT uses when a user asks it to browse a specific website in real time. Google-Extended controls whether Google can use your content for AI features like AI Overviews and AI Mode in search results. ClaudeBot and Anthropic-ai are Anthropic's crawlers for Claude. And PerplexityBot powers Perplexity's AI search engine, which is rapidly gaining market share.

The problem many retailers face is that their robots.txt was written years ago, before these crawlers existed. Some default configurations from e-commerce platforms even include blanket blocks on unknown user agents. If your file contains a broad Disallow: / for any of these bots, your products simply do not exist in that AI's world.

But the solution is not to throw the doors wide open, either. You want AI crawlers to see your product pages, collection pages, and informational content – the things that help them recommend your products. You do not want them crawling your cart, checkout flow, customer accounts, or admin panels. Those pages add noise, waste crawl budget, and can even expose information you would rather keep private.

Here is a robots.txt configuration that strikes the right balance for a typical e-commerce store. It explicitly welcomes the major AI crawlers to your product and collection pages while keeping sensitive areas off-limits:

# AI Shopping Crawlers - explicitly allowed
User-agent: GPTBot
Allow: /products/
Allow: /collections/
Allow: /pages/
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/

User-agent: ChatGPT-User
Allow: /

User-agent: Google-Extended
Allow: /products/
Allow: /collections/

User-agent: PerplexityBot
Allow: /products/
Allow: /collections/

User-agent: ClaudeBot
Allow: /products/
Allow: /collections/

User-agent: Anthropic-ai
Allow: /products/
Allow: /collections/

# All other crawlers
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/

Sitemap: https://your-store.com/sitemap.xml

Notice the Sitemap: directive at the bottom. This is the bridge between your security guard and your building directory – it tells every crawler where to find the complete map of your site. Many store owners forget this line, and it matters more than you might think.

The Building Directory: Why sitemap.xml Matters for AI Discovery

Your sitemap.xml is the comprehensive list of every page on your site that you want crawlers to know about. While traditional search engines have gotten quite good at discovering pages through internal links, AI crawlers often work differently. They tend to be more targeted in their visits – they want to find product pages quickly, grab the structured data, and move on. A well-maintained sitemap makes this process dramatically more efficient.

For AI commerce specifically, make sure your sitemap includes every product page (even if you have tens of thousands), all collection and category pages, and important informational content like your shipping policy, returns page, and size guides. Use accurate lastmod dates so crawlers know which products are new or recently updated – AI shopping features tend to favor fresh content. And set meaningful priority values: your best-selling product pages should be higher priority than your privacy policy.

The sitemap handles breadth – it ensures no product page is missed. But it does not help an AI agent understand what your store actually is, what makes you different, or where to start when a customer asks a question. That is where the concierge comes in.

The Concierge: How llms.txt Helps AI Understand Your Store

The llms.txt specification is relatively new, but it is gaining adoption fast. It was created to solve a specific problem: when an AI agent lands on your site, it has a limited context window and limited time to figure out what you sell and how your site is organized. Crawling dozens of pages to piece this together is slow and error-prone. The llms.txt file gives the AI everything it needs in a single, structured document.

If robots.txt says "you may enter," and sitemap.xml says "here is everything we have," then llms.txt says "welcome, let me explain who we are and help you find what you need." It is the difference between being allowed into a department store and having someone greet you at the door, learn what you are looking for, and walk you to the right aisle.

A good llms.txt file is written in Markdown and placed at the root of your domain (e.g., https://your-store.com/llms.txt). It starts with your store name and a brief description, then links to the most important sections of your site with context about what each section contains. Here is an example for a mid-sized fashion retailer:

# Nordic Style Co.

> Scandinavian-inspired fashion for men and women. We specialize in
> minimalist wardrobe essentials made from sustainable materials.
> Free shipping over $100. 30-day returns on all items.

## Product Categories
- [Women's Collection](/collections/women): Dresses, tops, knitwear, outerwear
- [Men's Collection](/collections/men): Shirts, trousers, jackets, accessories
- [New Arrivals](/collections/new): Updated weekly with the latest drops
- [Sale](/collections/sale): Current markdowns across all categories

## Shopping Information
- [Size Guide](/pages/size-guide): Detailed measurements for all garments
- [Shipping & Delivery](/pages/shipping): Free over $100, 3-5 business days
- [Returns & Exchanges](/pages/returns): 30-day hassle-free returns
- [Gift Cards](/products/gift-card): Available from $25 to $500

## About
- [Our Story](/pages/about): Founded in Copenhagen in 2018
- [Sustainability](/pages/sustainability): Our commitment to ethical fashion
- [Contact](/pages/contact): [email protected]

Notice how this is not just a list of links. Each entry includes a short description that tells the AI what it will find on that page. When a customer asks ChatGPT "find me a sustainable winter jacket under $200," the AI can read this file and immediately know to look at the Women's or Men's Collection, that the brand focuses on sustainable materials, and where to check sizing and shipping information. Without llms.txt, the AI would need to crawl multiple pages to piece this together – and might never find the sustainability angle that makes this store a perfect match for the customer's query.

The Three Files as a System

What makes this powerful is not any single file in isolation – it is how all three work together. Consider what happens when a customer asks ChatGPT Shopping to find a product. First, GPTBot or ChatGPT-User checks your robots.txt. If they are allowed in, they check your sitemap.xml to discover all available pages. But before crawling everything, they look for llms.txt to get an overview of what you sell and which pages are most relevant to the customer's query. Then they visit only the pages they need, read the structured data (Product schema, pricing, availability), and present the results.

If any piece is missing, the system degrades. Block the crawler in robots.txt and nothing else matters – you are invisible. Allow the crawler but have no sitemap, and it might miss half your catalog. Have both but no llms.txt, and the AI wastes time crawling irrelevant pages or misunderstands what your store specializes in.

This is especially critical for smaller stores competing against marketplaces. Amazon does not need llms.txt because every AI already knows what Amazon is. But if you run a specialty ceramics shop or a niche athletic wear brand, these files are your chance to explain to AI agents exactly why your store is the right answer for specific customer queries.

Common Mistakes That Cost You AI Visibility

After auditing hundreds of e-commerce sites, certain patterns keep appearing. The most damaging mistake is an overly broad Disallow rule that blocks AI crawlers from product pages. This often happens when store owners copy a robots.txt template from a blog post written in 2019, before AI crawlers existed. Those templates sometimes include Disallow: / for all unknown bots as a safety measure – a reasonable idea at the time, but devastating for AI visibility today.

Another common issue is having no llms.txt at all. As of early 2026, fewer than 5% of e-commerce sites have one. This means creating one puts you ahead of nearly all your competitors in terms of AI readability. It is also one of the easiest files to create – it takes perhaps fifteen minutes to write well.

Stale sitemaps are another frequent problem. If your sitemap has not been updated since last season, new products will not show up in AI crawl results. Most e-commerce platforms generate sitemaps automatically, but it is worth verifying that the generation is actually running and including all product pages.

Finally, some stores forget to include the Sitemap: https://... directive in their robots.txt. Without this pointer, crawlers have to guess where your sitemap lives. Most will try /sitemap.xml by convention, but relying on that convention is unnecessary when a single line eliminates the ambiguity.

Getting Started: Your Action Plan

Open your robots.txt right now (visit your-store.com/robots.txt in a browser) and check whether GPTBot, ChatGPT-User, Google-Extended, ClaudeBot, and PerplexityBot are blocked or not mentioned at all. If they are blocked, update the file using the template above.
Verify that your robots.txt includes a Sitemap: directive pointing to your sitemap.xml.
Check your sitemap by visiting your-store.com/sitemap.xml. Make sure it includes all product pages and has recent lastmod dates.
Create an llms.txt file at your domain root. Spend fifteen minutes writing a clear description of your store and linking to your most important sections with helpful context.
Run a free AI commerce readiness audit to verify everything is configured correctly and catch any issues you might have missed.

See Where You Stand

Not sure whether your robots.txt is blocking AI crawlers, or whether your llms.txt is set up correctly? Our free AI commerce readiness audit checks both files automatically, along with your sitemap, structured data, and product feed. You will get a clear report showing exactly what is working, what is broken, and what to fix first. It takes less than a minute to run – just enter your store URL and let the scanner do the rest.