The warning comes after reports that AI companies regularly ignore instructions not to scrape.
Reddit has a warning for AI companies and other scrapers: play by our rules or get blocked. The company said in an update that it plans to update its Robots Exclusion Protocol (robots.txt file), which allows it to block automated scraping of its platform.
The company said it will also continue to block and rate-limit crawlers and other bots that don't have a prior agreement with the company. The changes, it said, shouldn't affect “good faith actors,” like the Internet Archive and researchers.
Reddit's notice comes shortly after multiple reports that Perplexity and other AI companies regularly bypass websites' robots.txt protocol, which is used by publishers to tell web crawlers they don't want their content accessed. Perplexity's CEO, in a recent interview with Fast Company, said that the protocol is “not a legal framework.”
In a statement, a Reddit spokesperson told Engadget that it wasn't targeting a particular company. “This update isn't meant to single any one entity out; it's meant to protect Reddit while keeping the internet open,” the spokesperson said. “In the next few weeks, we'll be updating our robots.txt instructions to be as clear as possible: if you are using an automated agent to access Reddit, regardless of what type of company you are, you need to abide by our terms and policies, and you need to talk to us. We believe in the open internet, but we do not believe in the misuse of public content.”
It's not the first time the company has taken a hard line when it comes to data access. The company cited AI companies' use of its platform when it began charging for its API last year. Since then, it has struck licensing deals with some AI companies, including Google and OpenAI. The agreements allow AI firms to train their models on Reddit's archive and have been a significant source of revenue for the newly-public Reddit. The “talk to us” part of that statement is likely a not-so-subtle reminder that the company is no longer in the business of handing out its content for free.