Copyrighted books are fair use for AI training, federal judge rules in Anthropic case

- Share via
Copyrighted books can be used to train artificial intelligence models without authors’ consent, a federal judge ruled Monday.
The decision marked a major victory for San Francisco startup Anthropic, which trained its AI assistant Claude using copyrighted books. The company, started by former OpenAI employees and backed by Amazon, was sued by authors Andrea Bartz, Charles Graeber and Kirk Wallace in August.
U.S. District Judge William Alsup ruled that Anthropic’s use of purchased books was “exceedingly transformative and was a fair use” but the company may have broken the law by using pirated books. Alsup ordered a trial in December to determine damages, which can reach up to $150,000 per case of willful copyright infringement.
“If someone were to read all the modern-day classics because of their exceptional expression, memorize them, and then emulate a blend of their best writing, would that violate the Copyright Act? Of course not,” the ruling reads.
Reed Hastings is joining Anthropic’s board. Hastings also serves on the boards of Bloomberg and the City Fund.
“The purpose and character of using copyrighted works to train [large language models] to generate new text was quintessentially transformative. Like any reader aspiring to be a writer, Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different.”
Anthropic pirated more than 7 million books from Books3, Library Genesis and Pirate Library Mirror, online libraries containing unauthorized copies of copyrighted books, to train its large language models, according to Alsup. As the company started to become “not so gung ho” about pirating “for legal reasons,” it brought on Tom Turvey from Google to obtain “all the books in the world” but still avoid “legal/practice/business slog.”
While Turvey initially inquired into licensing agreements with two major publishers, he eventually decided to purchase millions of print copies in bulk. The company then proceeded to strip the books’ bindings, cut their pages and scan them into digital and machine-readable forms, according to the decision.
Though the plaintiffs took issue with Anthropic making digital copies, Alsup ruled that this practice also falls under fair use: “The mere conversion of a print book to a digital file to save space and enable searchability was transformative for that reason alone,” he wrote.
Anthropic later purchasing books that it initially pirated did not absolve the company, but it may impact the extent of statutory damages, Alsup said.
This decision comes as Walt Disney Co. and Universal Pictures are involved in their own lawsuit against artificial intelligence company Midjourney, which the studios allege trained its image generation models on their copyrighted materials and may set an important precedent.
More to Read
The biggest entertainment stories
Get our big stories about Hollywood, film, television, music, arts, culture and more right in your inbox as soon as they publish.
You may occasionally receive promotional content from the Los Angeles Times.