Authors Sue Apple for $2.5B Over AI Trained on Pirated Books

Explore this post with:

Key Takeaways:

Apple faces copyright lawsuit over AI training data: Two authors accuse Apple of using pirated books from shadow libraries like Bibliotik to train AI models without proper licenses.
Responsible AI claims challenged by allegations: Despite Apple’s public stance on ethical AI, the lawsuit says Applebot scraped unlicensed content, including books, while hiding its true data sources.
Anthropic’s $1.5 billion settlement sets precedent: A recent deal over pirated training data underscores the financial risks Apple could encounter if courts rule similarly in this case.
Tech giants under mounting copyright scrutiny: Alongside Apple, OpenAI, Microsoft, and Meta face lawsuits that could shape future AI copyright law and regulation across the industry.
Possible consequences for AI development costs: A ruling against Apple may redefine ownership of AI training data, raising costs and tightening controls on how AI models are developed.

Yet another class action lawsuit has been filed against Apple, and this time it’s over copyright issues. Two authors, Grady Hendrix and Jennifer Roberson, sued Apple in California federal court, accusing Apple of using pirated books to train the AI models powering Apple Intelligence, reports Reuters.

The case centers on a controversial dataset known as Books3, which was built from so-called “shadow libraries” like Bibliotik, notorious for hosting thousands of pirated books. Plaintiffs argue that Apple relied on this dataset to train its OpenELM models and likely its larger Foundation Language Models, without ever seeking permission or paying authors. They say this conduct not only deprived them of compensation but also gave Apple a competitive edge by using their work to develop commercial products.

Apple’s Ethical AI Claims Under Scrutiny

Apple has long presented itself as a company trying to be ethical with AI. It previously signed licensing deals, including a multimillion-dollar agreement with Shutterstock for images, and even offered publishers cash for access to their archives. Apple has also stated it respects web standards like robots.txt and won’t scrape sites that block crawlers. But the lawsuit alleges a different reality, that Apple’s own web scraper, Applebot, has been collecting vast amounts of online content for nearly a decade, including unlicensed copyrighted books.

According to the complaint, Apple concealed the true sources of its training data to avoid legal challenges. The lawsuit claims Apple copied books whose outputs now compete with the originals, diluting their market value and undermining authors’ rights.

FaceTime Like a Pro:

Get our exclusive Ultimate FaceTime Guide 📚 — absolutely FREE when you sign up for our newsletter below.

The Anthropic Precedent

The timing of this lawsuit is no accident. Just days earlier, AI startup Anthropic agreed to pay $1.5 billion to settle a class action over similar claims. Authors accused Anthropic of pulling millions of pirated books from datasets like Books3, Library Genesis, and Pirate Library Mirror to train its Claude chatbot. The settlement, hailed as the largest copyright recovery in history, could set a benchmark for what Apple might face.

That deal also required Anthropic to destroy pirated files, underscoring how seriously courts view these shadow libraries. Legal experts said Anthropic risked being financially crippled had it gone to trial. The Apple case now lands in the same federal court, drawing inevitable comparisons.

A Broader Copyright Battle

Apple is not alone. OpenAI, Microsoft, and Meta are also facing lawsuits from authors and publishers alleging their works were used without consent to train generative AI systems. With governments and courts still undecided on whether training AI on copyrighted material is fair use or infringement, these lawsuits could define the future of AI development.

For Apple, which has been pitching Apple Intelligence as a secure and privacy-focused assistant for its devices, the lawsuit strikes at its reputation as a more responsible player in AI. If successful, plaintiffs want damages, restitution, and even the destruction of Apple’s AI models trained on pirated works.

The outcome of this case could have far-reaching implications, determining not just the cost of training data but also who controls the building blocks of AI.

Don’t miss these related reads:

Explore this post with:

ChatGPT Perplexity Grok Google AI