LLMS.txt is a new file format that lets websites specify how their content can be used by large language models (LLMs). Similar to robots.txt, it gives site owners control over AI training access and transparency. In this article, we explain what LLMS.txt is, how it works, its significance in the AI age, and show real-world examples of implementation by major platforms.
As artificial intelligence (AI) continues to evolve, it is also changing how websites and language models interact. One of the biggest changes in this regard has been the launch of the LLMS.txt file, which is an emerging standard aiming to enable the owners of websites to exercise more control over how their material is accessed and used by AI tools. Similar to robots.txt, LLMS.txt is essentially a communication bridge between content creators and AI companies, allowing more transparency and ethical use of data.
Some companies are experimenting with llms.txt files or similar declarations, but verifiable examples, such as published files, remain limited. For example, Cloudflare and Anthropic have publicly discussed ways to protect data from unauthorized AI scraping, but there is no central registry to verify actual llms.txt usage yet.
The llms.txt file is a plain text file that websites can use to state how their information should be handled by large language model (LLM) companies.
The file is usually stored at the top of the site.
such as at
Its main goal is to notify LLM developers (such as OpenAI, Anthropic, Perplexity, and numerous others) whether they can crawl, index, or utilize the product of that site to enhance their AI models. It follows the same basic principles as the well-established robots.txt file, which tells search engines how to crawl a site.
The name LLMS.txt stands for ‘large language models text,’ indicating that the file is meant specifically for companies developing and training AI language models.
The fact that AI models that scrape, summarize, and paraphrase web content are exploding has raised significant ethical and legal issues. Authors, journalists, musicians, teachers, and companies are more concerned that their original creative work is being used without their consent by AI companies.
The key problems include:
The LLMS.txt file provides a simple, centralized way for websites to declare their preferences. It allows site owners to say “yes,” “no,” or “limited access” to various LLM companies. This is an important step toward a more ethical, consent-based ecosystem for AI development.
A typical LLMS.txt file contains directives that either allow or disallow certain AI companies or tools from accessing a website’s content. These directives are written in plain text, and LLM developers are encouraged to check for and respect them—though compliance is still voluntary and not legally enforceable in most regions.
txt
CopyEdit
# Allow Anthropic access
allow: anthropic
# Disallow OpenAI
disallow: openai
# Allow Perplexity under conditions
allow: perplexity
Comment: Access permitted for indexing only; no model training allowed.
The file may also include metadata, contact emails, and notes for context. More advanced use cases may involve conditional access, like allowing data usage only for certain parts of a site or for non-commercial purposes.
If you manage a website and want to protect your content from unauthorized AI access, adding an LLMS.txt file is a simple step you can take.
For a complete guide, visit LLMStxtHub’s Getting Started Page, which includes formatting tips, directive syntax, and best practices.
Several major tech companies have already adopted the LLMS.txt format to clarify their positions regarding AI access. Here are a few notable examples:
Cloudflare’s stated policy can be found at LLMS.tx hub. It explicitly restricts AI training companies from using their content, reflecting their stance on protecting customer data and infrastructure transparency.
Example Entry:
txt
CopyEdit
disallow: openai
disallow: anthropic
disallow: perplexity
This clearly communicates a no-access policy for leading LLM developers.
Interestingly, even AI companies like Anthropic use the LLMS.txt file to set boundaries. View their file . This practice helps build trust with users and signals a commitment to ethical AI practices.
Perplexity’s file is visible . While Perplexity is an AI-powered search engine, it uses its LLMS.txt to outline clear access protocols. This helps demonstrate that they aim to operate responsibly and with user consent.
The audio AI company ElevenLabs also has a public LLMS.txt (view here). While not primarily a language model company, their use of LLMS.txt shows the standard’s versatility across AI domains, including speech synthesis, voice cloning, and more.
While both files are designed to regulate data access, they serve different purposes:
Some industry observers recommend using both files together for layered protection.
It’s important to note that llms.txt is not legally binding. Its effectiveness depends on the voluntary compliance of AI companies. Nevertheless, its increasing popularity may affect the current norms of the industry, perhaps preparing the way for future regulation.
Where AI companies operate in regions that have privacy rules (such as GDPR in the European Union), there is some likelihood that information on scraped data may eventually come into conflict with these rules. Thus, although LLMS.txt is not legally binding today, it promotes broader AI accountability and supports user consent.
The rise of LLMS.txt represents a positive shift toward transparency in the age of AI. As AI tools become more advanced, society must adapt to ensure that technology respects ownership, consent, and intellectual property.
Looking ahead:
By taking simple actions like publishing an LLMS.txt, website owners can actively participate in shaping and helping build clearer standards for AI transparency and content protection.
While llms.txt is still new and voluntary, it’s a promising step toward clearer communication between website owners and AI developers. By adding one to your site, you help promote transparency, protect your content, and encourage responsible AI practices.
Matthew Tauber
5 minutes read
July 27, 2025
Share on:
Matt Tauber is a mechanical engineer and product developer with a passion for creating innovative solutions. He enjoys turning ideas into real-world products and sharing his knowledge through writing.
JOIN OUR NEWSLETTER