Many companies see challenges in using Generative AI because they are worried about risks. Risks come in many forms – consistency, compliance, security, and infringement. The last threat, infringement, is on the minds of many leaders. What if the model builder trains its model on content to which it does not have sufficient IP rights? And what if the model generates content that is seemingly distinct and substantially different from the original work?
Indeed, many content creators are pursuing legal action against Generative AI providers. In 2023, artists Sarah Andersen, Kelly McKernan, and Karla Ortiz filed a class action lawsuit alleging that Stability AI used their copyrighted work without their permission in training Stable Diffusion. Later last year, a judge issued a written decision dismissing all but one claim for direct copyright infringement on behalf of just one plaintiff but provided the plaintiffs with leave to amend the dismissed claims. Stock photo provider Getty Images has also sued Stability AI, accusing it of misusing more than 12 million Getty photos to train its Stable Diffusion AI image-generation system. On Dec 27, 2023, The New York Times sued Microsoft and OpenAI in a copyright infringement complaint that alleges the companies used the newspaper’s content to train large language models. In early January, two nonfiction book authors filed a class-action complaint against Microsoft and Open AI, alleging that they “simply stole” their work. These complaints may take a while to clear the legal system, and the first lawsuit above may indicate how the judges will rule in future cases.
While the resolution of these claims is unclear, are there any ways to mitigate risk in the interim? For example, one simple way of reducing the copyright infringement risk is for end users to avoid making specific references to known and famous works, authors, or painters when drafting prompts. While such prompt engineering may not eliminate the risk, it reduces the likelihood that the generated output would resample a copyright-protected work.
Here are a few other tactics that various organizations have proposed.
They simply indemnify you
When sued for copyright infringement, Microsoft indemnifies commercial customers of its Copilot tools, which provide generative AI assistance as part of Office, GitHub, and other products. Microsoft calls this the Customer Copyright Commitment, and they require that customers implement their guardrails and commitments to qualify. For example, for text generation such as “journalistic content, writing assistance, or other open text generation scenarios,” Microsoft requires specific content filter settings. “The protected material text model must be configured in filter mode. The jailbreak model must be configured in filter mode.” Like Microsoft, OpenAI offers the indemnifying “Copyright Shield,” but requires users to meet some limitations and expectations to qualify for that protection — as do Amazon, Google, and IBM. As Generative AI providers are in a much better position to know the content on which their models are trained, indemnification provides one level of protection to the users. These indemnifications may cover damages and legal costs up to a certain dollar limit. But could the user still be stuck with hefty fees?
They secure sufficient IP rights in the content they use for training or delivery of their models
Generative AI providers use three different approaches to secure the necessary IP rights in the training content. Providers may (i) negotiate with content providers to use copyrighted material, (ii) train the models on material they own, (iii) or pay content providers whenever their content or likeness is used.
Some generative AI providers are trying to negotiate deals with content creators so that content may be used to train models without fear of infringement action. Open AI reportedly had months of negotiations with NYT and others. According to numerous reports, OpenAI has offered between $1 million and $5 million yearly to license copyrighted news articles to train its AI models.
In some instances, content creators — more aptly called content distributors — are building their own generative AI. Getty Images launched Generative AI by iStock for a segment of their customers, including small businesses, designers, and marketers. These generative AI models were trained exclusively using proprietary data from Getty Images’ creative libraries. Getty Images wants its customers to be confident in the content they are generating. Similarly, Adobe launched Adobe Firefly, its generative AI tool trained on Adobe stock images and other licensed content.
As another example, a training company that uses Avatars to deliver training programs hired human actors (most AI avatars are based on real people) with the explicit idea that when a client uses an actor’s likeness to provide training, the actor will get paid for such use.
Will Open Source protect you?
Using open-source models does not limit copyright infringement risk in all instances. AI-generated content produced by open-source models may still be subject to copyright infringement actions. The AI model itself doesn’t determine whether the output infringes on anyone’s copyright; instead, the copyright infringement risk arises from the use of copyright-protected content as part of training data and its relationship to model output. But does the risk increase because none of the major players have negotiated agreements with content providers?
Given where we are, what would it take for organizations to use generative AI in content generation? Are the assurances from Big Tech enough, or would organizations want more assurances? Would they naturally limit the scope of their content generation workstreams?