Hold the Champagne: The Two AI Training/Copyright Decisions Released in the US Last Week Were a Mixed Bag for AI Developers

01.07.2025
By Hugh Stephens

Hugh Stephens Blog

Image: Shutterstock.com

Last week I wrote about the questionable ethics of META’s use of pirated content to train its AI model, Llama, pointing out the ethical issues involved with META’s admitted use of pirated online libraries, such as LibGen (Library Genesis), to feed content to Llama for training purposes. This is quite apart from whatever legal issues that may arise from the widespread practice of ingesting copyrighted content for AI training by making an unauthorized copy from any source (such as a legitimate library, through purchase of a single copy of a work, or from publicly available internet sources, for example) not to mention the additional element of taking that content from pirate sources. The day after that blog was posted the first of what will be a series of legal decisions in the US regarding cases brought by authors and copyright holders against AI companies was issued, followed by another a day later. Both cases were heard in the Northern District of California, in the same San Franciso court house, but handled by different judges.

I updated last week’s blog to make reference to the Bartz v Anthropic case (hereafter “Anthropic”), but given the importance of that decision, combined with a decision released in another California court room a day later (Kadrey et al v META), these cases merit further exploration–especially since they were widely trumpeted by AI advocates as opening the door to unauthorized use of copyrighted content for AI training on the basis of “fair use”.

Fair use is the complex legal doctrine used in the US to determine exceptions to copyright protection. US readers are well aware of the intricacies and idiosyncrasies of fair use but for those not overly familiar with how it works, here is a short summation I drew from a blog post on fair use vs fair dealing that I wrote a few years ago.

In the US context, fair use is an affirmative defence against copyright infringement and is determined by the courts on a case by case basis, judged against several fairness factors (purpose and character of the use, the nature of the work copied, the amount and substantiality of the amount of the work used, and the effect of the use on the value of the original work)… Fair use is not defined by law. Some examples are given in US law of areas where the use is likely to be fair (criticism, comment, news reporting, teaching, scholarship, research) but these are illustrative and not exhaustive. In short, it is the courts that decide. This in turn can lead to extensive litigation as to what is and is not fair use, and it is worth noting that different judicial circuits in the US have at times come up with conflicting interpretations.

Or, for that matter, two different judges in the same circuit delivering decisions just days apart on similar issues but with some significantly different outcomes, as we saw last week (although in these cases both found fair use by AI developers with regard to the copyrighted works at issue).

On the Anthropic case, US District Judge William Alsup ruled, on summary judgement, that the use of copyrighted works for AI training, even though done without authorization, is highly transformative and does not substitute for the original work (“The technology at issue was among the most transformative many of us will see in our lifetimes”). It thus qualifies, according to Alsup, as fair use because the transformative nature of the use overrides or swallows the three other fair use factors, including the important fourth factor (effect of the use on the value of the work). He notes there was no allegation that the output of Anthropic’s model, known as “Claude”, produced content infringing the works of the plaintiffs. However, Judge Alsup then went on to consider the legality of Anthropic’s actions to download more than 7 million works from pirate libraries (such as Books3, Library Genesis and the Pirate Library Mirror) to constitute its reference library, which it initially planned to use for AI training. He concluded this was a prima facie case of copyright infringement, whether Anthropic intended to use some or all of the pirated works to train Claude or not. (“Anthropic seems to believe that because some of the works it copied were sometimes used in training LLMs (Large Language Models), Anthropic was entitled to take for free all the works in the world and keep them forever with no further accounting “.) Damages, to be decided at trial, could be substantial. Alsop did not, however, rule explicitly on whether or not the use of pirated works for AI training purposes could be a fair use.

Because of the controversial nature of Alsup’s findings on transformation and fair use, there is no question that this case will be appealed. While there have been many criticisms of the fair use elements of Alsup’s ruling, a particularly clear and trenchant analysis was put forth by Kevin Madigan of the Copyright Alliance (Fair Use Decision Fumbles Training Analysis but Sends Clear Piracy Message).

The second case last week to reach the decision stage was Kadrey et al v META. In this case District Judge Vince Chhabria found that META’s use of the works of the plaintiffs, thirteen noted fiction writers, to train its AI model (“Llama”) was also fair use. Chhabria, like Alsup, found that META’s use was transformative on the first fairness factor dealing with the purpose and character of the use (“There is no serious question that Meta’s use of the plaintiffs’ books had a “further purpose” and “different character” than the books—that it was highly transformative.”) but unlike Alsup, Chhabria put much greater emphasis on market harm, (the fourth fairness factor dealing with the effect of use on the value of the work) suggesting that it could be determinative. Unfortunately for the plaintiffs, however, Chhabria considered their arguments with respect to market harm to be unconvincing. There was no evidence that Llama’s output reproduced their works in any substantial way or substituted for the specific works at play nor was there evidence, according to the judge, that the unauthorized copying deprived the authors of licensing opportunities.

Chhabria suggested that a far more cogent argument would have been that use (unauthorized reproduction) of copyrighted books to train a Large Language Model might harm the market for those works by enabling the rapid generation of countless similar works that compete with the originals, even if the works themselves are not infringing. In other words, causing indirect substitution for the works rather than direct substitution. This is the theory of “market dilution”, which was also put forward speculatively by the US Copyright Office in its recent Pre-Publication Report on AI and copyright. Since this wasn’t presented as an argument, Chhabria could not rule on it but in effect he is inviting future litigants to pursue this line of argument, noting that his decision on fair use relates only to the works of the thirteen authors who brought the case.

The clearest way to illustrate his line of reasoning is to quote directly,

“In cases involving uses like Meta’s, it seems like the plaintiffs will often win, at least where those cases have better-developed records on the market effects of the defendant’s use. No matter how transformative LLM training may be, it’s hard to imagine that it can be fair use to use copyrighted books to develop a tool to make billions or trillions of dollars while enabling the creation of a potentially endless stream of competing works that could significantly harm the market for those books”.

This editorializing, known in legal circles as obiter dicta, is not binding nor precedential, yet will undoubtedly have some influence given Chhabria’s stature. It is likely that one of these days Judge Chhabria will have the opportunity to put these theories into practice when ruling on a similar case, but one where the plaintiffs have made a better case for market harm. He has provided them a roadmap.

While these two cases have fired the first shots in what is going to be a lengthy war, they do not seem to be dispositive. There are enough caveats and nuances to be able to conclude that the AI developers are far from being out of the woods. Both “victories” have a sting in their tail, especially Judge Alsup’s finding on piracy. Neither copyright advocates nor AI developers should be breaking out the champagne just yet. But whichever way it turns out, there will be some sure winners; the lawyers for each side.

This article was originally published on Hugh Stephens Blog

Hold the Champagne: The Two AI Training/Copyright Decisions Released in the US Last Week Were a Mixed Bag for AI Developers

Latest News

Latest News

Busan Film Festival Expands Vision Section, Launches Innovation Platform

Latest News

‘Made in India’ Chips to Hit the Market by 2025-End, 4 New Semiconductor Projects Cleared: Modi

Latest News

Bombay HC rules film titles can’t be monopolised under copyright

Start typing and press Enter to search