Tech giants are racing to discover fresh data sources to train their AI models effectively.
At Meta, the urgency of this matter led executives to convene almost daily in March and April last year to devise a strategy, as per The New York Times.
As AI capabilities advance, companies are compelled to pursue data more assertively, potentially exposing them to copyright concerns. For instance, OpenAI has faced suspicions of leveraging YouTube to train its video generator, Sora. However, the company’s CTO, Mira Murati, refuted these claims.
In Meta’s discussions, participants reportedly explored the option of acquiring Simon & Schuster, a publishing house that private equity firm KKR acquired for \(1.62 billion in August. Alternatives included paying \)10 per book to secure full licensing rights to new publications.
By that juncture, Meta had already condensed numerous books, articles, and digital content. The company had engaged contractors in Africa to compile summaries of both fiction and nonfiction works, some containing copyrighted material. A manager acknowledged the challenge of avoiding such content during a meeting.
Participants deliberated on the possibility of continuing to gather data from potentially copyrighted sources without pursuing formal licensing agreements. When a legal advisor raised ethical concerns regarding intellectual property rights, the room fell silent, according to the Times.
Meta refrained from immediate comments when approached by Business Insider for feedback.
Ultimately, the executives referenced the legal precedent established in Authors Guild vs. Google, a 2015 case that did not proceed to the Supreme Court. The lower court ruling, upheld by the Supreme Court, permitted Google to scan and digitize books for Google Books under fair use principles. Meta’s legal team asserted that the company could train its AI systems within the same framework, as reported by the Times.