Subscribe to get latest news delivered straight to your inbox


    Is it Ethical to Use Pirated Content for Commercial Purposes? META Thinks So

    • 23.06.2025
    • By Hugh Stephens
    Hugh Stephens Blog
    Two signs hanging on a string, one labeled 'ETHICAL' in green and the other labeled 'LEGAL' in red, against a purple background.

    Image: Shutterstock.com

    There is the question of what is ethical, and then there is the question of what is legal. Sometimes they are the same, often not. The legality of using copyrighted content without authorization for commercial purposes, such as in training AI models—as META and a number of other companies have done—is being decided in court. In META’s case, however, there is the further complaint (not denied by META) that many of the unauthorized copies it made were taken from pirated content. While this revelation may not change the fundamentals of the copyright infringement case against it, there is still the ethical question for META to answer. On this, it comes up short. Very short.

    META, the parent company of Facebook, Instagram and WhatsApp, used vast amounts of copyrighted content, without permission or licensing, to train its AI model. It is not alone in doing so. This practice may or may not be legal. A number of cases are working their way through the courts, most of them in the US, with copyright owners from Getty Images to Disney and Universal, from the New York Times to the Authors Guild and on to music labels, all claiming that their content was unfairly and illegally copied to provide training fodder for training AI models, such as META’s model, Llama. META and other AI developers claim that their use was a “fair use” under US law. We’ll see. However, as part of its giant vacuuming of publicly available (but in many cases protected) content, META also ingested content from various pirate sites and databases, notably the notorious “shadow library”, LibGen (Library Genesis). LibGen originated in Russia and contains up to 80 million scientific and academic articles, as well as millions of novels and nonfiction books, most unauthorized, unlicensed copies. It has a been sued by major academic and textbook publishers. In 2017 Elsevier won a $15 million judgement against LibGen, and another pirate website, SciHub. Last year Elsevier was awarded a $30 million default judgement. However, both LibGen and SciHub remain available online.

    The extent of the copyrighted content held by LibGen was revealed in an investigative report published recently by The Atlantic. You can search through the LibGen database as published by The Atlantic to find out what works are included, and whether your work has been pirated. Authors from Newfoundland to New York and lots of places in-between and elsewhere found their works included in the database when they did the search. The Authors Guild advises writers to fight back by sending a formal notice to META and other AI companies asserting their rights, as well as adding a “No AI Training” notice on the copyright page of works. This is in addition, as would be expected, to joining the Authors Guild to help them fight what is happening.

    Consuming pirated content can result in costly penalties, as some unfortunate downloaders have found out to their regret. Using it for commercial purposes is even more egregious. It’s like running a pirate streaming service based on stolen content. META didn’t use pirated content in this way, but they used it commercially just the same, in their case for AI training. Were they aware of what they were doing. You bet they were.

    The discovery process in the US suit of Kadrey et. al. v META revealed a series of email exchanges in which some META employees expressed concerns over the ethics of using pirated content. The concerns went up the chain and back, with “MZ” (guess who? No, not Moses Znaimer) giving approval to proceed. Following on these revelations in Kadrey v META in the US, two class action lawsuits have been filed in Canada, one in Quebec on behalf of a number of French language authors and one in British Columbia. The Quebec suit specifically flags the piracy issue. Among the listed complaints is the following:

    Rather than acting within the law and respecting the rights of class members, it (META) deliberately chose to train its LLMs (Large Language Models) from datasets containing illicit copies of works from all over the world, including those of class members.”

    Damages sought are $20,000 per work.

    It is clear that META wilfully torrented content from LibGen, knowing that many or most of the works on LibGen were infringing, pirated copies. They just didn’t care.

    If it turns out that somehow, inexplicably, META’s unauthorized use of copyrighted content for AI training is ruled by the US courts to be fair use, would the fact that the source of some of the content was from a pirate source be relevant? I am not sure, but a judgement that has just been delivered in California in the “Anthropic” case suggests that even if unauthorized copying can be justified as fair use because it is considered “transformative”, that does not excuse piracy–which is still an infringement. In this case, Anthropic copied both purchased and pirated works to train its AI model, and kept the copies in its central library. It was sued by some authors and journalists in a class action suit alleging copyright infringement. The judge, in one of the first such cases to reach a decision point, concluded on summary judgement that Anthropic’s unauthorized reproduction of copyrighted works for AI training was fair use under the transformation doctrine but added, with respect to those works drawn from pirate sources such as LibGen and others,

    “piracy of otherwise available copies is inherently, irredeemably infringing even if the pirated copies are immediately used for the transformative use and
    immediately discarded”
    .

    A trial will be held to determine the damages from the piracy.

    Being one of the first AI training cases out of the gate, Anthropic will certainly be appealed, so this is not the last word. However, this ruling when added to the US Copyright Office’s views expressed in its Pre-Publication Report on Generative AI issued last month on May 9, a day before Register Shira Perlmutter was dismissed, that “the copying of expressive works from pirate sources in order to generate unrestricted content that competes in the marketplace, when licensing is reasonably available, is unlikely to qualify as fair use”, suggests that META could be in both ethical and legal trouble.

     

    In other jurisdictions, such as Singapore, content used in AI training under a Text and Data Mining exception has to be legally accessed, although this is very thin legal protection because technology companies can legally purchase just one copy of a work to comply. In Canada you cannot break the law (i.e. circumvent a technological protection measure) to exercise a fair dealing right. But whether or not using a pirated source puts META offside the law in the US with respect to fair use, (and the Anthropic case suggests that it could at least with respect to the pirated works), think of the ethics and the image this presents to the public.

    A company like META, capitalized at something like $2 trillion, cannot be bothered to even access content legitimately, let alone use it legitimately. Why? Because MZ said it was ok to proceed. Sadly, even though they are not the only ones to use pirated content to train their AI models, that tells me all I need to know about the values and ethics of this particular company.

    © Hugh Stephens, 2025. All Rights Reserved.

    This post has been updated to include reference to the decision in the Anthropic case, released after the initial publication of this blog post

    This article was originally published on Hugh Stephens Blog