{"id":16511,"date":"2026-05-01T14:24:30","date_gmt":"2026-05-01T14:24:30","guid":{"rendered":"https:\/\/skybeaconnews.com\/index.php\/2026\/05\/01\/why-news-publishers-are-blocking-ai-from-accessing-internet-archives\/"},"modified":"2026-05-01T14:24:30","modified_gmt":"2026-05-01T14:24:30","slug":"why-news-publishers-are-blocking-ai-from-accessing-internet-archives","status":"publish","type":"post","link":"https:\/\/skybeaconnews.com\/index.php\/2026\/05\/01\/why-news-publishers-are-blocking-ai-from-accessing-internet-archives\/","title":{"rendered":"Why news publishers are blocking AI from accessing internet archives"},"content":{"rendered":"<p>By&amp;nbsp<a href=\"https:\/\/www.euronews.com\/profiles\/2872\" rel=\"noreferrer\" target=\"_blank\">Indrabati Lahiri<\/a>                                       Published on             <time datetime=\"2026-05-01 16:24:30 +02:00\">01\/05\/2026 &#8211; 16:24 GMT+2<\/time>                                                                                                                 <\/p>\n<h2>         AI companies using archived news content could be a major violation of copyright laws, especially in the midst of active lawsuits against companies such as  OpenAI and Perplexity.     <\/h2>\n<p>Around 245 global news organisations across nine countries are attempting to block the Internet Archive\u2019s crawlers. These are automated software bots that capture, display and archive content from web pages in the Internet Archive\u2019s public-facing interface, the Wayback Machine. <\/p>\n<p>                                      <img decoding=\"async\" src=\"https:\/\/static.euronews.com\/website\/images\/logos\/logo-euronews-stacked-outlined-72x72-grey-9.svg\"\/>           ADVERTISEMENT                                                     <img decoding=\"async\" src=\"https:\/\/static.euronews.com\/website\/images\/logos\/logo-euronews-stacked-outlined-72x72-grey-9.svg\"\/>           ADVERTISEMENT                 <\/p>\n<p>The Archive holds over one trillion web pages dating all the way back to 1996, making it one of the biggest collective public information resources in the world. This includes past articles from major news organisations such as CNN, The New York Times, The Guardian, and USA Today. <\/p>\n<p>These web pages are used for a variety of purposes, for example, as primary sources for historians, or to prove changes after publication. <\/p>\n<p>Several news organisations are now pushing to block the crawlers as AI companies are now using the contents of the Archive to train Large Language Models (LLMs) without offering fair payment or acquiring permission. <\/p>\n<p>More than 20 major news organisations already block ia_archiverbot, the main web crawler the Internet Archive uses for the Wayback Machine, according to an analysis by AI-detection company Originality AI. <\/p>\n<p>However, at least one of the Archive\u2019s four crawling bots is blocked by 241 global news sites. A major chunk of these <a href=\"https:\/\/www.euronews.com\/next\/2026\/02\/20\/us-builds-website-to-let-europeans-access-content-banned-by-their-own-governments\" rel=\"noreferrer\" target=\"_blank\"><strong>blocked sites<\/strong><\/a> is owned by USA Today Co, the US\u2019s biggest newspaper publisher. This means that hundreds of local publications have been practically removed from historical records. <\/p>\n<p>         Related      <\/p>\n<ul>\n<li><a href=\"https:\/\/www.euronews.com\/2026\/04\/25\/can-europe-keep-its-industrial-champions-in-the-ai-era\" rel=\"noreferrer\" target=\"_blank\">Can Europe keep its industrial champions in the AI era?<\/a><\/li>\n<li><a href=\"https:\/\/www.euronews.com\/2026\/04\/23\/the-eus-age-verification-app-a-long-awaited-technical-fix\" rel=\"noreferrer\" target=\"_blank\">The EU&#039;s age-verification app: a long-awaited \u2018technical fix\u2019 <\/a><\/li>\n<\/ul>\n<h2>The risks of archival content being used to train AI<\/h2>\n<p>Archival news content provides massive quantities of high-quality text and images to train large-scale <a href=\"https:\/\/www.euronews.com\/next\/2026\/01\/01\/from-ai-slop-to-world-models-bubbles-and-small-models-what-to-expect-from-ai-in-2026\" rel=\"noreferrer\" target=\"_blank\"><strong>AI models<\/strong> <\/a>in more human writing. This is available through URL and API interface, which allows different software to communicate with each other and request data, acting as a bridge between systems. <\/p>\n<p>This makes it even easier for AI companies to access archived data and train models. <\/p>\n<p>Another advantage is that content in the Internet Archive is already structured, attributed and dated. <\/p>\n<p>Much of the Internet Archive\u2019s data has already been found in key AI-training datasets. However, this is a major weakness for news organisations, which are already suing AI companies such as <a href=\"https:\/\/www.euronews.com\/next\/2024\/10\/24\/googles-latest-rival-what-is-perplexity-ai-and-why-is-it-causing-so-much-controversy\" rel=\"noreferrer\" target=\"_blank\"><strong>Perplexity<\/strong> <\/a>and <a href=\"https:\/\/www.euronews.com\/next\/2026\/04\/29\/not-ok-to-steal-a-charity-elon-musk-testifies-in-legal-battle-with-sam-altman-over-openai\" rel=\"noreferrer\" target=\"_blank\"><strong>OpenAI<\/strong><\/a> for potential copyright violations. <\/p>\n<p>\u201cThe issue is that Times content on the Internet Archive is being used by AI companies in violation of copyright law to directly compete with us,\u201d Graham James, a spokesperson from The New York Times newspaper, said, as cited by The Next Web. <\/p>\n<p>\u201cThe Times invests an enormous amount of resources in producing original journalism, and that work should not be used without our permission.\u201d<\/p>\n<p>Other organisations, such as The Guardian, have taken a more conservative approach by limiting, rather than completely blocking the Archive\u2019s access. <\/p>\n<p>         Related      <\/p>\n<ul>\n<li><a href=\"https:\/\/www.euronews.com\/2026\/04\/20\/empowering-children-online-the-council-of-europe-is-skeptical-of-blanket-social-media-bans\" rel=\"noreferrer\" target=\"_blank\">Empowering children online: the Council of Europe is sceptical of blanket social media bans<\/a><\/li>\n<li><a href=\"https:\/\/www.euronews.com\/2026\/04\/15\/eurosky-europe-aims-to-rival-big-tech-with-its-own-social-media-ecosystem\" rel=\"noreferrer\" target=\"_blank\">Eurosky: Europe aims to rival Big Tech with its own social media ecosystem<\/a><\/li>\n<\/ul>\n<h2>Internet Archive maintains that it is \u201ccollateral damage\u201d<\/h2>\n<p>The Wayback Machine\u2019s director, Mark Graham, has maintained that they are merely \u201ccollateral damage\u201d and that the real culprits are the AI companies which access past content through the Archive\u2019s interfaces. <\/p>\n<p>However, the Archive has taken measures of its own to limit this. This includes preventing large downloads of some site materials and limiting automated extraction in certain cases. <\/p>\n<p>Graham highlighted that the Archive functions as a key method of preservation. Without this, articles which are not archived can be edited without authorisation or accountability. This can be anything from changing or removing quotes, amending mistakes or redirecting claims and official statements. <\/p>\n<p>Currently, these changes are tracked by the Wayback Machine. <\/p>\n<p>This has led to some news organisations attempting to work with the Internet Archive to find acceptable compromises or workarounds which involve limiting access rather than hard blocks. <\/p>\n<p>Similarly, non-profit digital rights advocacy group Fight for the Future has also launched a petition, already signed by 100 current journalists, to protest against this blocking. This is especially at a time when public records and history are increasingly contested. <\/p>\n<p>                                        <a href=\"https:\/\/www.euronews.com\/next\/2026\/05\/01\/#accessibility-bar__menu\" rel=\"noreferrer\" target=\"_blank\">Go to accessibility shortcuts<\/a>                                                               <\/p>\n<h2>         Read more       <\/h2>\n<figure>     <a href=\"https:\/\/www.euronews.com\/business\/2026\/04\/30\/google-parent-alphabet-profit-jumps-81-in-big-tech-earnings-roundup\" rel=\"noreferrer\" target=\"_blank\">       <img decoding=\"async\" src=\"https:\/\/images.euronews.com\/articles\/stories\/09\/74\/15\/07\/480x270_cmsv2_f703b88f-05e0-57da-9d86-59ddfffbc209-9741507.jpg\" alt=\"FILE - A woman walks by a giant screen displaying the Google logo at the AI Action Summit in Paris. 9 February 2025 \"\/>     <\/a>           <\/figure>\n<p>                               <a href=\"https:\/\/www.euronews.com\/business\/business\" rel=\"noreferrer\" target=\"_blank\">Business<\/a>                  <a href=\"https:\/\/www.euronews.com\/business\/2026\/04\/30\/google-parent-alphabet-profit-jumps-81-in-big-tech-earnings-roundup\" rel=\"noreferrer\" target=\"_blank\">  <\/p>\n<h3>                                  Google parent Alphabet profit jumps 81% amid Big Tech earnings results   <\/h3>\n<p>             <\/a>                                                                                                                                    <\/p>\n<figure>     <a href=\"https:\/\/www.euronews.com\/next\/2026\/04\/29\/are-there-alternatives-to-mainstream-social-media-platforms-euronews-tech-talks\" rel=\"noreferrer\" target=\"_blank\">       <img decoding=\"async\" src=\"https:\/\/images.euronews.com\/articles\/stories\/09\/73\/87\/36\/480x270_cmsv2_294447fa-03a4-5069-b586-0210261207f2-9738736.jpg\" alt=\"Are there alternatives to mainstream social media platforms? |Euronews Tech Talks\"\/>     <\/a>                                    <\/figure>\n<p>                               <a href=\"https:\/\/www.euronews.com\/programs\/euronews-tech-talks\" rel=\"noreferrer\" target=\"_blank\">Euronews Tech Talks<\/a>                  <a href=\"https:\/\/www.euronews.com\/next\/2026\/04\/29\/are-there-alternatives-to-mainstream-social-media-platforms-euronews-tech-talks\" rel=\"noreferrer\" target=\"_blank\">  <\/p>\n<h3>                                  Are there alternatives to mainstream social media? |Euronews Tech Talk   <\/h3>\n<p>             <\/a>                                                                                                                                    <\/p>\n<figure>     <a href=\"https:\/\/www.euronews.com\/business\/2026\/04\/28\/exclusive-fintech-giant-revolut-is-opening-first-physical-store-in-barcelona\" rel=\"noreferrer\" target=\"_blank\">       <img decoding=\"async\" src=\"https:\/\/images.euronews.com\/articles\/stories\/09\/73\/72\/57\/480x270_cmsv2_3fcddb19-09ae-5ef0-b800-a80aefd41404-9737257.jpg\" alt=\"Besides plans to open a physical store, Revolut is also opening a new office in Barcelona this summer on Passeig de Gr\u00e0cia.\"\/>     <\/a>           <\/figure>\n<p>                               <a href=\"https:\/\/www.euronews.com\/business\/business\" rel=\"noreferrer\" target=\"_blank\">Business<\/a>                  <a href=\"https:\/\/www.euronews.com\/business\/2026\/04\/28\/exclusive-fintech-giant-revolut-is-opening-first-physical-store-in-barcelona\" rel=\"noreferrer\" target=\"_blank\">  <\/p>\n<h3>                                  Exclusive: Revolut set to open first physical store in Barcelona   <\/h3>\n<p>             <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>By&#038;nbspIndrabati Lahiri Published on 01\/05\/2026 &#8211; 16:24 GMT+2 AI companies using archived news content could be a major violation of copyright laws, especially in the midst of active lawsuits against companies such as OpenAI and Perplexity. Around 245 global news organisations across nine countries are attempting to block the Internet Archive\u2019s crawlers. These are automated software bots that capture, display and archive content from web pages in the Internet Archive\u2019s public-facing interface, the Wayback Machine.<br \/>\n ADVERTISEMENT ADVERTISEMENT The Archive holds over one trillion web pages dating all the way back to 1996, making it one of the biggest collective public information resources in the world. This includes past articles from major news organisations such as CNN, The New York Times, The Guardian, and USA Today.<br \/>\n These web pages are used for a variety of purposes, for example, as primary sources for historians, or to prove changes after publication.<br \/>\n Several news organisation..<\/p>\n","protected":false},"author":3,"featured_media":16512,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-16511","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tie-world"],"_links":{"self":[{"href":"https:\/\/skybeaconnews.com\/index.php\/wp-json\/wp\/v2\/posts\/16511","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/skybeaconnews.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/skybeaconnews.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/skybeaconnews.com\/index.php\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/skybeaconnews.com\/index.php\/wp-json\/wp\/v2\/comments?post=16511"}],"version-history":[{"count":0,"href":"https:\/\/skybeaconnews.com\/index.php\/wp-json\/wp\/v2\/posts\/16511\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/skybeaconnews.com\/index.php\/wp-json\/wp\/v2\/media\/16512"}],"wp:attachment":[{"href":"https:\/\/skybeaconnews.com\/index.php\/wp-json\/wp\/v2\/media?parent=16511"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/skybeaconnews.com\/index.php\/wp-json\/wp\/v2\/categories?post=16511"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/skybeaconnews.com\/index.php\/wp-json\/wp\/v2\/tags?post=16511"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}