According to OpenAI, meeting the current demands for high-quality neural networks without incorporating copyrighted materials from others would be extremely challenging. The utilization of non-copyrighted public domain content could potentially result in subpar AI models, as suggested by a Microsoft-supported entity that justifies its practice of using such content for model training.
This assertion comes at a time when the machine learning community appears to be disregarding trademark laws. A recent report by IEEE highlighted instances where Midjourney and OpenAI’s DALL-E 3, prominent AI services, employed training data to replicate copyrighted scenes from movies and video games and translate text prompts into visual representations.
The study, co-authored by modern artist Reid Southen and AI expert Gary Marcus, documents numerous cases of “plagiaristic outputs,” where OpenAI and DALL-E 3 generate strikingly similar versions of movie scenes and images of famous actors.
Marcus and Southen suggest that Midjourney and OpenAI likely trained their AI image-generation models on copyrighted material.
The debate continues on whether this practice is lawful and whether AI providers or their clients could face legal repercussions. The study’s findings, however, seem to support allegations of copyright infringement against Midjourney and OpenAI.
Both OpenAI and Midjourney possess the capability to produce content that potentially infringes on copyrights and trademarks without notifying users of such risks. The lack of transparency regarding the training data used for their AI models is a notable concern.
Beyond digital artists, AI companies like OpenAI face challenges with their ChatGPT text model, which reportedly reproduces paywalled articles from newspapers, leading to legal disputes such as the recent lawsuit filed by the New York Times. Similar issues have been raised by software developers and guide writers.
The implications of the IEEE study are likely to aid copyright claimants, but legal experts like Tyler Ochoa from Santa Clara University caution against misinterpretation of the situation. Ochoa argues that the AI models can produce plagiaristic outputs using copyrighted material, even without explicit requests, raising questions of contributory infringement by the model creators.
Ochoa emphasizes the importance of examining whether the AI models were intentionally directed to replicate copyrighted content or if the responsibility lies with the creators for not preventing such outcomes. He suggests that the prevalence of copyrighted material in the training data significantly influences the AI model’s outputs.
OpenAI defends its position by asserting that training AI models without copyrighted content is impractical and that their compliance with copyright laws is maintained. However, skepticism remains regarding OpenAI’s motives in seeking approval for their practices.
In response to concerns raised by the House of Lords, OpenAI emphasizes the necessity of training AI models on copyrighted material to meet contemporary demands. While acknowledging the need for further support for creators, OpenAI maintains its stance on legal compliance and ethical training practices.
The ongoing debate surrounding AI models, copyright infringement, and ethical considerations underscores the complex intersection of technology, intellectual property rights, and legal responsibilities.