Written by 8:15 pm AI, Latest news

### Alleged Leak Exposes Over 16,000 Artists’ Names Used Without Consent for Midjourney’s AI Training

The lists were both partially included in a recent class-action lawsuit and accidentally shared via…

Lists containing the names of over 16,000 artists supposedly utilized to educate the Midjourney generative AI program have circulated widely online, reigniting discussions on copyright and consent in AI image generation. Notable names on the list include Frida Kahlo, Walt Disney, and Yayoi Kusama.

Initial outrage among artists on X (formerly Twitter) stemmed from the dissemination of a Google spreadsheet labeled “Midjourney Style List,” reportedly obtained from Midjourney developers during the refinement process of the program’s ability to replicate works by specific artists and styles. Although access to the web document was promptly restricted, many of the artists and prompts listed also appeared in publicly available court documents related to a 2023 class-action lawsuit, featuring a 25-page roster of names referenced in the training images for the Midjourney program.

Despite the legal ambiguity surrounding the practice of using artists’ work without consent to train generative AI programs, controversies like the “Midjourney Style List” shed light on the actual procedures involved in transforming copyrighted artwork into AI reference materials.

In a series of X posts, artist Jon Lam, employed by the video game developer Riot Games, shared screenshots of a chat where Midjourney developers allegedly discussed incorporating artist names and styles into the program from sources like Wikipedia. This ensured that selected artists’ work would be accessible for imitation and prominently featured as reference material for image creation. One screenshot includes a purported post by Midjourney’s CEO, David Holz, celebrating the incorporation of 16,000 artists into the program’s training. Another screenshot captures a chat member sarcastically addressing copyright concerns, suggesting using scraped datasets and conveniently forgetting the training materials to circumvent legal issues, met with enthusiastic agreement from group members.

The “scraped” datasets mentioned in the chat are a focal point of the class-action lawsuit seeking redress from Stability AI, Midjourney, and DeviantArt for the unauthorized use of artists’ work in training generative AI programs. While the initial lawsuit faced partial dismissal in October for various defects, it was amended and refiled in November, adding more plaintiffs and including the video generator Runway AI as a defendant.

Lam encouraged artists whose names appeared in the list of over 16,000 to join as additional plaintiffs, emphasizing that the lawsuit remains active and has garnered more evidence and claimants.

The updated case file highlights that Stability AI’s attempt to dismiss the primary claim of direct copyright infringement for misappropriation of billions of images for AI training was denied by the court. Similarly, Midjourney’s dismissal attempt was unsuccessful.

At the core of the copyright infringement accusation against Midjourney is its use of the LAION-5B dataset, comprising 5.85 billion internet-sourced images, including copyrighted content. Despite LAION’s intended use solely for academic research, the lawsuit asserts that Midjourney knowingly employed the dataset in its commercial services to train its generative AI program. The case also alleges copyright infringement through Midjourney’s utilization of Stability AI’s Stable Diffusion text-to-image software, trained on unattributed copyrighted works.

Various tools have been proposed to combat copyright infringement in the realm of generative AI, with the University of Chicago’s Glaze program being a prominent example. Designed to protect artists from programs like Midjourney and Stable Diffusion, Glaze alters an image’s digital data to appear unchanged to humans but drastically different to AI models. Despite its imperfections, this free system has gained traction in response to concerns about targeted style replication. The recommendation to “Glaze” one’s work gained significant traction following the exposure of the “Midjourney Style List.”

Additionally, the website haveibeentrained.com has gained popularity among artists, offering a means to check if their work has been utilized as training data in generative AI programs. The site also features a Do Not Train Registry, allowing artists to exclude their works from cooperative datasets.

Visited 4 times, 1 visit(s) today
Last modified: January 5, 2024
Close Search Window
Close