Nvidia has come under fire over allegations that it sourced pirated books and other copyrighted works from online shadow libraries to train its AI models, with court filings purportedly showing that the move was approved by the company’s top leadership.
The AI chip giant’s data strategy team allegedly offered to pay for high-speed access to Anna’s Archive, a shadow library portal, according to documents released by TorrentFreak. Additionally, these filings allegedly show that Nvidia’s green team management approved the payment plan within a week.
The documents comprising email snippets have surfaced as part of the discovery process in an ongoing class action lawsuit against Nvidia by authors. The lawsuit has accused Nvidia of copyright infringement by training its AI models on content from the Books3 dataset, including copyrighted works taken from pirate site Bibliotik.
Following the new allegations, the plaintiffs have reportedly filed an amended complaint to expand the scope of the lawsuit.
AI companies have been routinely harvesting data by scraping the internet to train large language models (LLMs). However, the recent allegations suggest that they are now targeting pirated works hosted on online shadow libraries. Besides Nvidia, Meta and Anthropic have faced separate allegations that they used pirated digital copies of millions of books to train their AI models. They were accused of dredging the same Books3 dataset.
However, the allegations against Nvidia are unprecedented as it involves a US company proposing a business arrangement with an online portal engaged in piracy.
What is Anna’s Archive?
Anna’s Archive is an open-source search engine for shadow libraries, which typically contain paid or paywalled content that has been pirated or uploaded for free. The platform functions like a regular search engine and helps users find material hosted elsewhere on the internet. It reportedly does not host pirated material itself.Story continues below this ad
So far, most of the searchable content via Anna’s Archive has been books, research papers, and other literary material because “text has the highest information density,” as per the platform. The various domains linked to Anna’s Archive are among the most targeted URLs in Google takedown requests filed by copyright holders, according to the TorrentFreak blog.
Why has Nvidia been sued?
Besides copyright infringement, Nvidia has also been accused of giving corporate customers automatic access to pirated datasets such as ‘The Pile’ and ‘Books3’. The authors behind the class action lawsuit are looking for compensation for the damages they have suffered.
The chipmaker has reportedly sought to defend its actions under the ‘fair use’ exception in copyright law.
In July 2025, a US district court ruled that Anthropic did not violate copyright law by using books to train its Claude AI models. The AI startup’s use of a database comprising scanned, purchased books combined with a specific training method was deemed by US District Judge William Alsup as being transformative enough to meet the standards of fair use.Story continues below this ad
Meta also scored a win in a major copyright case in the same week, with US District Judge Vince Chhabria siding with the social media giant on the use of books to train its Llama models as being protected by the fair use doctrine in US copyright law.
Expand © IE Online Media Services Pvt Ltd
