one year on
Federal judge rules Anthropic's AI training on copyrighted books is fair use, but library of pirated copies must go to trial
In a landmark decision, a federal judge issues the first time courts have given credence to fair-use arguments for LLM training, finding Anthropic's training on published books was fair use while sending claims over millions of pirated books to trial.
A federal judge ruled today that Anthropic’s use of copyrighted books to train its large language models qualifies as fair use, handing AI companies a significant early victory in the copyright wars.
Federal judge William Alsup found that Anthropic’s training on published books was ‘exceedingly transformative,’ and that converting print books to digital formats for training did not constitute infringement. The ruling is the first substantive fair-use decision in the wave of copyright lawsuits against AI companies.
However, the judge declined to dismiss claims regarding the millions of these copyrighted books Anthropic allegedly downloaded from pirate sites to build its training library. ‘We will have a trial on the pirated copies used to create Anthropic’s central library and the resulting damages,’ Judge Alsup wrote, adding that buying a legitimate copy after the fact ‘will not absolve it of liability for theft.’
The mixed ruling gives neither side a total win. For the tech industry, it establishes fair use as a viable defense for training on copyrighted works, a position Meta and others have argued. For authors and publishers, it keeps alive the question of whether building datasets from illicit sources can be shielded by fair use.
The case now proceeds to trial on the central library issue.
One year later — open only if you can handle spoilers
The Bartz v. Anthropic ruling set a precedent that most subsequent fair-use cases followed, though the separate Meta ruling days later on different grounds kept the legal landscape fragmented. The central library trial never concluded before a confidential settlement in late 2025, leaving the core question of pirate-sourced training data unresolved.