AI

What Is LLMS.txt? Why It Matters, Working and Examples

LLMS.txt is a new file format that lets websites specify how their content can be used by large language models (LLMs). Similar to robots.txt, it gives site owners control over AI training access and transparency. In this article, we explain what LLMS.txt is, how it works, its significance in the AI age, and show real-world examples of implementation by major platforms.

What Is LLMS.txt_ Why It Matters, Working and Examples.svg

As artificial intelligence (AI) continues to evolve, it is also changing how websites and language models interact. One of the biggest changes in this regard has been the launch of the LLMS.txt file, which is an emerging standard aiming to enable the owners of websites to exercise more control over how their material is accessed and used by AI tools. Similar to robots.txt, LLMS.txt is essentially a communication bridge between content creators and AI companies, allowing more transparency and ethical use of data.

Some companies are experimenting with llms.txt files or similar declarations, but verifiable examples, such as published files, remain limited. For example, Cloudflare and Anthropic have publicly discussed ways to protect data from unauthorized AI scraping, but there is no central registry to verify actual llms.txt usage yet.

What is LLMS.txt?

The llms.txt file is a plain text file that websites can use to state how their information should be handled by large language model (LLM) companies.

The file is usually stored at the top of the site.

such as at

https://example.com/LLMS.txt

Its main goal is to notify LLM developers (such as OpenAI, Anthropic, Perplexity, and numerous others) whether they can crawl, index, or utilize the product of that site to enhance their AI models. It follows the same basic principles as the well-established robots.txt file, which tells search engines how to crawl a site.

The name LLMS.txt stands for ‘large language models text,’ indicating that the file is meant specifically for companies developing and training AI language models.

Why LLMS.txt Matters

The fact that AI models that scrape, summarize, and paraphrase web content are exploding has raised significant ethical and legal issues. Authors, journalists, musicians, teachers, and companies are more concerned that their original creative work is being used without their consent by AI companies.

The key problems include:

  • Lack of transparency: Many websites don’t know if or how their data is being used to train AI models.
  • Unauthorized use: Some AI tools may ingest data from websites without explicit consent, even if the content is copyrighted.
  • Revenue loss: Publishers fear that AI-generated summaries might replace clicks to their sites, affecting ad revenue and subscriptions.
  • Content integrity: AI scraping can misinterpret or misrepresent nuanced content.

The LLMS.txt file provides a simple, centralized way for websites to declare their preferences. It allows site owners to say “yes,” “no,” or “limited access” to various LLM companies. This is an important step toward a more ethical, consent-based ecosystem for AI development.

How LLMS.txt Works

A typical LLMS.txt file contains directives that either allow or disallow certain AI companies or tools from accessing a website’s content. These directives are written in plain text, and LLM developers are encouraged to check for and respect them—though compliance is still voluntary and not legally enforceable in most regions.

Example Format:

txt

CopyEdit

# Allow Anthropic access

allow: anthropic

# Disallow OpenAI

disallow: openai

# Allow Perplexity under conditions

allow: perplexity

Comment: Access permitted for indexing only; no model training allowed.

The file may also include metadata, contact emails, and notes for context. More advanced use cases may involve conditional access, like allowing data usage only for certain parts of a site or for non-commercial purposes.

Getting Started with LLMS.txt

If you manage a website and want to protect your content from unauthorized AI access, adding an LLMS.txt file is a simple step you can take.

Steps to Create and Publish LLMS.txt:

  • Open a plain text editor (like Notepad or VSCode).
  • Write your directives following the format shown above.
  • Save the file as LLMS.txt.
  • Upload it to your website’s root directory

For a complete guide, visit LLMStxtHub’s Getting Started Page, which includes formatting tips, directive syntax, and best practices.

Real-World Examples of LLMs in Action

Several major tech companies have already adopted the LLMS.txt format to clarify their positions regarding AI access. Here are a few notable examples:

1. Cloudflare

Cloudflare’s stated policy can be found at LLMS.tx hub. It explicitly restricts AI training companies from using their content, reflecting their stance on protecting customer data and infrastructure transparency.

Example Entry:

txt

CopyEdit

disallow: openai

disallow: anthropic

disallow: perplexity

This clearly communicates a no-access policy for leading LLM developers.

2. Anthropic

Interestingly, even AI companies like Anthropic use the LLMS.txt file to set boundaries. View their file . This practice helps build trust with users and signals a commitment to ethical AI practices.

3. Perplexity AI

Perplexity’s file is visible . While Perplexity is an AI-powered search engine, it uses its LLMS.txt to outline clear access protocols. This helps demonstrate that they aim to operate responsibly and with user consent.

4. ElevenLabs

The audio AI company ElevenLabs also has a public LLMS.txt (view here). While not primarily a language model company, their use of LLMS.txt shows the standard’s versatility across AI domains, including speech synthesis, voice cloning, and more.

LLMS.txt vs. Robots.txt: What's the Difference?

While both files are designed to regulate data access, they serve different purposes:

ChatGPT Image Jul 28, 2025, 03_53_20 PM.png

Some industry observers recommend using both files together for layered protection.

Legal and Ethical Considerations

It’s important to note that llms.txt is not legally binding. Its effectiveness depends on the voluntary compliance of AI companies. Nevertheless, its increasing popularity may affect the current norms of the industry, perhaps preparing the way for future regulation.

Where AI companies operate in regions that have privacy rules (such as GDPR in the European Union), there is some likelihood that information on scraped data may eventually come into conflict with these rules. Thus, although LLMS.txt is not legally binding today, it promotes broader AI accountability and supports user consent.

Looking to the future, llms.txt could help shape clearer ethical standards for AI

The rise of LLMS.txt represents a positive shift toward transparency in the age of AI. As AI tools become more advanced, society must adapt to ensure that technology respects ownership, consent, and intellectual property.

Looking ahead:

  • Expect more companies to adopt LLMS.txt as a best practice.
  • AI firms may begin showcasing their compliance as a trust signal.
  • Legal frameworks may emerge to support or enforce these types of standards.

By taking simple actions like publishing an LLMS.txt, website owners can actively participate in shaping and helping build clearer standards for AI transparency and content protection.

Conclusion

While llms.txt is still new and voluntary, it’s a promising step toward clearer communication between website owners and AI developers. By adding one to your site, you help promote transparency, protect your content, and encourage responsible AI practices.

Matthew

Matthew Tauber

5 minutes read

July 27, 2025

Share on:

Matt Tauber is a mechanical engineer and product developer with a passion for creating innovative solutions. He enjoys turning ideas into real-world products and sharing his knowledge through writing.

JOIN OUR NEWSLETTER

Stay up do date with latest insights and trends