Cloudflare to let customers block AI web crawlers

From today, Cloudflare users will be able to block artificial intelligence (AI) crawlers from accessing their web content without permission of monetary compensation by default, in a bid to stop AI models from scraping and using content in their training databases.

Use of intellectual property such as art, fiction, music, news media, video and other forms of creative endeavour and expression, to train AI models without recognition or recompense has become a major sticking point for creatives worldwide, fueled a wave of anti-AI sentiment, and led to lawsuits on both sides of the Atlantic.

Recognising the potential threat AI models pose to fundamental aspects of the human condition, Cloudflare said its new settings marked the “first step” towards a more sustainable future both content creators and AI innovators alike.

“If the internet is going to survive the age of AI, we need to give publishers the control they deserve and build a new economic model that works for everyone – creators, consumers, tomorrow’s AI founders, and the future of the web itself,” said Matthew Prince, co-founder and CEO of Cloudflare.

“Original content is what makes the internet one of the greatest inventions in the last century, and it’s essential that creators continue making it. AI crawlers have been scraping content without limits. Our goal is to put the power back in the hands of creators, while still helping AI companies innovate. This is about safeguarding the future of a free and vibrant Internet with a new model that works for everyone.”

Cloudflare, which handles over 15% of global internet traffic via its content delivery network (CDN) said that the internet has long operated on a simple exchange by which search engines index content and direct users to websites to generate traffic and ad revenue. While not perfect this system has proved fairly consistent in rewarding content creators with and web users alike.

However, the advent of AI crawlers has broken this bargain because in scraping content to improve the output of generative AI (GenAI) models without sending web users to the source, crawlers deprive content creators of views and revenues and cause them to become disincentivised to keep working, to the detriment of wider society.

Cloudflare had previously introduced a one-click block option to stop web crawlers in September 2024 – and said over a million customers have opted in to date. The introduction of a permission-based model adds more fine-grained controls to the equation.

The new settings will allow site owners to choose if they want AI crawlers to access their content and decide how AI companies are allowed to use it. AI companies, in turn, will be able to state the purpose of their crawlers – which is to say whether they are used for training, inference, or search purposes – to help site owners decide whether to allow them.

All new domain owners signing up to Cloudflare will now be asked if they wish to allow or block AI crawlers, with the default being to control their activity, meaning customers must make an explicit choice to opt in to allowing them. Existing customers can easily check their settings and allow AI crawlers at any point should they desire.

Multiple Cloudflare customers are already signing up, with many prominent publishers describing it as a “game changer” for content creators. Others said it could potentially help end the rush among news organisations to unpopular paywall-based business models.

Roger Lynch, CEO of Condé Nast, said: “When AI companies can no longer take anything they want for free, it opens the door to sustainable innovation built on permission and partnership.

“This is a critical step toward creating a fair value exchange on the internet that protects creators, supports quality journalism and holds AI companies accountable.”

Kristin Heitmann, chief revenue officer at The Associated Press (AP) agency, added: “The information landscape continues to change rapidly but the value of accurate, factual, nonpartisan journalism has never been more essential.

“We’re pleased to participate in this important framework that will help ensure intellectual property is protected and all content creators are fairly compensated for their work.”

Sharon Moshavi, president of the International Center for Journalists (ICFJ), a Washington DC-based non-profit, and co-CEO of ICFJ+, a provider of critical infrastructure for journalists and technologists to deliver information, also voiced her support.

“We see journalists across the world providing vital, original reporting to their communities, yet AI bots scrape their work for free while newsrooms struggle to stay open,” said Moshavi.

“At ICFJ+, we are working with small news sites – beginning in Africa and across a variety of languages – to help them protect and reclaim the value of their original work in the age of AI. We welcome this very promising initiative from Cloudflare.”

Pay up or get off my site

At the same time, Cloudflare has also announced the private beta of another tool, dubbed Pay Per Crawl.

The idea of Pay Per Crawl originated during conversations with content creators during the development of the crawler blocking tool. Cloudflare said that while all agreed that creators should be able to block or allow all AI crawlers depending on their wishes, creators had expressed a “consistent desire” for a third path in which AI crawlers are allowed to access their content but they also get paid.

While theoretically possible already, this requires knowing the right people at an AI provider and negotiating with them, a challenge for creatives who may lack the scale and leverage to do so.

Cloudflare engineers Will Allen and Simon Newton said they had now hit on a way to allow creatives to charge AIs.

“We’re excited to help dust off a mostly forgotten piece of the web: HTTP response code 402,” they wrote in a blog post. “Pay per crawl integrates with existing web infrastructure, leveraging HTTP status codes and established authentication mechanisms to create a framework for paid content access. 

“Each time an AI crawler requests content, they either present payment intent via request headers for successful access (HTTP response code 200), or receive a 402 Payment Required response with pricing. Cloudflare acts as the Merchant of Record for Pay Per Crawl and also provides the underlying technical infrastructure.”

Its creators hope Pay Per Crawl may herald a fundamental shift in how content is controlled online by empowering creators to keep working.

Other future use cases for the tool could help support different rates for different content types or different AI crawlers, for example. Allen and Newton said the tool may have even greater potential as agentic AI develops, where people querying AI agents could set them a specific budget based on the topic at hand – more for legal advice, less for a restaurant booking, for example. They envisage a future where intelligent AI agents “can programmatically negotiate access to digital resources.”


Source link