According to the Chinese zodiac, the Year of the Rabbit (this year) is supposed to represent “relaxation, quietness and contemplation”. This “Rabbit Year” is predicted to be “calm and gentle”. (Good news for the people of Ukraine, if true). However, when it comes to the thorny issue of copyright and content generated by artificial intelligence (AI), the Year of the Rabbit may end up being the Year of Contentious Litigation.
First out of the gate in 2023 are two cases, one where Getty Images is suing Stability AI in the High Court of London (UK) and the second in the US where artists Sarah Andersen, Kelly McKernan, and Karla Ortiz have brought a class action suit against AI powered art services Stable Diffusion (owned by Stability AI), Midjourney and Deviant Art. Technically these cases were not initiated in the Year of the Rabbit, but in the last days of the preceding Year of the Tiger. The date of Chinese New Year varies each year, being determined through a complex calculation involving both solar and lunar cycles, but it always falls sometime in January or February. In 2023, that day was January 22. Anything in this calendar year that occurred before that date took place in the Tiger Year. As can be imagined by its name, the Year of the Tiger is associated with “courage and bravery” and the “defeat of evil”. (Another Ukraine analogy). It certainly takes courage to take on large AI platforms as the artists have done. As for defeating evil, I don’t think that is the characterization I would have for AI, although there is no doubt that it can be put to evil purposes. In fact, this is already happening with the production of artificial photo-substituted images of celebrities and others, and use of AI platforms like ChatGPT to produce misinformation.
Okay, let set aside the theme of the Chinese zodiac for a moment, and focus on the point of this blog post, which is to look at what is happening on the litigation front with respect to AI-generated content. There are fundamentally two principal issues involved. First, whether AI-generated content (images, writing, music) infringes the copyright of creators by building its database through the scraping and ingestion of millions of copyrighted works off the internet without licence or authorization. Second, whether the producers of AI-generated content (that is, users manipulating the platform, not the platforms themselves) have any claim to copyright on the works created. It is all very unclear, which is exactly the point of the litigation.
In the case of Getty Images, the company already licenses content to technology companies to help them train artificial intelligence systems to generate AI products, but Stability AI did not bother to seek a licence. Stability AI is just one of a number of companies that has emerged in the AI generated image space, as I wrote about in an earlier blog. (AI and Computer-Generated Art: Its Impact on Artists and Copyright). Both Microsoft through its investment partnership with OpenAI (DALL-E2 and ChatGPT) and Google are developing AI platforms, although they have been relatively cautious with their release owing to issues with NSFW content as well as copyright issues. DALL-E2 and ChatGPT have been released as trials while Google has been holding back on the public release of its latest AI generated image product, Imagen. In contrast, Stability AI’s founder, Emad Mostaque, seems to subscribe to the “better to ask for forgiveness after rather than permission before” school of thought. A couple of months ago, I wrote about Mostaque and his laissez-faire approach to copyright and AI. (AI Generated Art: Another “Technical Breakthrough” Calling Out for Responsible Management and Regulatory Oversight).
Given Mostaque’s approach, the fact that Stability AI chose not to acquire licences–yet clearly ingested some images owned by Getty (the company’s watermarks even show up on some of the AI generated works)–is not surprising. Getty’s goal is no doubt to provide AI companies with a sharp reminder that licensed images exist for AI training. As for the artists’ lawsuit, its basis and likelihood of success is less clear. Dr. Andres Guadamuz of the University of Sussex, editor of the Journal of World Intellectual Property, has criticized the basis of the case as being technically inaccurate. (see below for more information)
The artists’ suit against Stability AI, MIdjourney and Deviant Art is based on the factual reality that the “data” that feeds the AI art generating algorithms consists of millions of images scraped without authorization from the internet, many of them under copyright and at least a few of them belonging to the plaintiffs. Specifically the suit declares that Stability AI through its application “Stable Diffusion”, “downloaded or otherwise acquired copies of billions of copyrighted images without permission”. It then “caused those images to be stored and incorporated…as compressed images”, referred to as “Training Images”. The training images were then used to train the algorithm. The “new” images produced by Stable Diffusion’s algorithm are argued to be derivative works produced from the training images. The suit states that ultimately the platform is basically a complex collage tool that creates art works “in the style of” based on real artists’ works without compensation or permission leading to “blatant and enormous infringement” of copyright while producing works that compete in the marketplace with the works of the original artists. Accordingly, the suit accuses the defendants of direct and vicarious copyright infringement, violation of the DMCA (by removal of Copyright Management Information), violation of the right of publicity and unlawful and unfair competition. Relief sought is in the form of statutory and punitive damages, costs, and permanent injunctive relief. If successful, this suit would throw a major spanner into the works of AI machines producing works based on data (consisting of images, written works, and music as examples) ingested without licence or permission from the internet. It would require much smaller data sets based on licensed or public domain content. Another option would be for the AI platforms to allow and make it easy for artists, writers and musicians (“authors”) to opt out of the database.
The tech industry does not want to have to tailor its databases or be subject to any restrictions on data mining. Scooping it all up (for free) is the model it prefers. In arguing their case, the defendants will claim fair use, insisting that the resultant work is not derivative but rather the result of a transformative process. They will also no doubt claim that what the algorithm produces is not a copy of the artists’ work, even though it may be “in the style of”. However, although not an exact copy, the question will be whether it reproduces a “substantial” part of the original work? Another defence argument will probably be that not all the ingested works necessarily fall under copyright, the output being an amalgam of millions of images, some protected, some not.
In addition to legal arguments rebutting the complaint, there are also technical arguments to consider. In his article critiquing the lawsuit from a technical perspective, Dr. Guadamuz argues that the scraped inputs do not form a “collage”. He states they are a combination of links to images (not the images themselves) combined with text descriptions, all of which are themselves narrowed by the descriptor that is inputted into the model. If the user is seeking an image that uses the word “cats”, all data that does not relate to cats is discarded. Thus, he argues, it is inaccurate to claim that AI generated art draws on a database created from every image ever ingested. He concludes that the artists involved in the suit will therefore not be able to prove that their work was infringed.
However, one problem for those arguing that text and data mining fall under fair use is the reality that the resulting “new image” can and often does substitute for the original work (which may have been included in the training data), thereby negatively affecting the author’s ability to economically exploit the work. The interpretation of fair use and fair dealing in relation to text and data mining (TDM) is still very fluid.
The UK, for example, is currently reviewing its existing TDM exception. (Britain’s Proposed Approach to Text and Data Mining (TDM) for AI: How Not to do It (A Lesson for Canada and Others). The original proposal from the Intellectual Property Office (IPO) was to allow TDM “for any purpose” and to specifically eliminate any possibility of licensing content for text and data mining. It was argued that this broad exception would promote “innovation”, but the proposal aroused huge opposition from Britain’s artistic community. Recently the Communications and Digital Committee of the House of Lords studied the IPO proposals and concluded that “The Intellectual Property Office’s proposed changes to intellectual property law are misguided. They take insufficient account of the potential harm to the creative industries.” Even the Minister responsible has said she thinks it is likely the IPO’s proposed changes will not proceed. For its part, the IPO now has indicated it will continue consultation. While there are potentially legitimate reasons for some TDM exceptions, allowing it for “any purpose” is damaging and misguided, and hopefully the IPO’s proposals as they stand will be substantially modified.
Many academics have argued they need access to TDM material to be able to conduct research. In some countries, such as Canada, there is currently no TDM exception in copyright law, an issue that may be reviewed in the next update of the Copyright Act, expected in the next year or so. However, there is a big difference between providing open access to copyrighted content to enable sampling of works to produce research that does not reproduce the original work and in no way competes with it, (and is used for non-commercial purposes), and using copyrighted works to produce derivatives that can substitute for or compete with the originals. Any TDM exception needs to be narrowly focussed to achieve the end of enabling legitimate research while avoiding a free-for-all that would undermine the interests of rights-holders.
Apart from the question of whether AI platforms should be able to ingest copyrighted content (note that what is often described as “data” actually consists of pre-existing copyright protected works) without authorization, there is the issue of who can exercise rights to “new” images created by the algorithm. Who created them? Was it the algorithm/platform or the user who “created” them by inputting minimal textual instructions. “Create a work of a dog eating ice cream in the style of…..”. There is not much originality of expression there although longer, more complex prompts/instructions could theoretically qualify. This is another point that will require clarification. On the other hand, if the user cannot claim ownership because of failure to meet the test of originality, could the rights be held by the creators of the algorithm? And if it was an algorithmically generated image, was there any human creation involved? If not, according to the US Copyright Office, such a work is not eligible for copyright registration. Finally, any discussion of possible copyright protection for an AI-generated image must address basic questions related to the legitimacy of the inputs used to generate it. Has the output become a “poisoned” derivative work because of infringement at source?
These and other questions will need to be weighed by the courts. The decisions will have a major impact on artists and other creators, the way in which AI develops in the years ahead, and even the role of AI content in modern society. The Getty and class action lawsuits are but the first shots in what will be a long campaign. The Year of the Rabbit is unlikely to be “calm and gentle” on this issue. Next year, 2024, is the Year of the Dragon and it is supposed to represent “good luck”. One has to ask, “good luck for whom”? The courts will have an important role in answering that question.
This article was first published on Hugh Stephens Blog