Written by zgiaonews• February 29, 2024• 7:51 am• Latest news, OpenAI

### OpenAI Undervaluing News Outlets for Quality Data: Reports

HomeLatest news, OpenAI### OpenAI Undervaluing News Outlets for Quality Data: Reports

Other companies reportedly want to pay more than OpenAI.

The ChatGPT enterprise is actively pursuing additional media companies to engage in licensing agreements for AI model training.

Authored by Emilia David, an AI-focused writer previously covering the realms of technology, finance, and media before her tenure at The Verge.

As media entities ink deals with AI firms to enrich their models using news content, the compensation offered by entities like OpenAI for licensed data is being unveiled.

According to The Information, OpenAI is willing to shell out between $1 million and $5 million annually to procure copyrighted news content for AI model training. This revelation sheds light on the substantial sums AI enterprises are prepared to invest in licensed data. A recent report indicates that Apple is also in talks with media organizations, proposing a minimum of $50 million over several years to access data for AI training purposes. The Verge has sought comments from OpenAI regarding these financial details.

These figures bear semblance to previous non-AI licensing agreements. For instance, when Meta introduced the now-defunct Facebook News tab in Europe, reports suggested it offered up to $3 million per month for news articles, stories, and videos. However, the actual payments may not have reached the scale of some of the larger sums mentioned. In a similar vein, Google’s 2020 announcement revealed a $1 billion investment to collaborate with news outlets. Additionally, under regulatory pressure, Google recently committed to an annual payment of $100 million to Canadian publishers for article linking rights.

Presently, contemporary large-scale language models predominantly draw training data from internet sources. While transparency varies among AI models regarding their data sources, information on utilized datasets or web crawling methods is typically accessible. Pricing for training data is contingent on the company, dataset size, and content intricacy. Some services, like LAION, offer open-source data free of charge, utilized by models such as Stable Diffusion. AI developers commonly deploy web crawlers to amass data from the web for model training. (It’s noteworthy that developers must allocate resources to vet, tag, and refine training data, significantly elevating operational expenses.)

Nonetheless, challenges have emerged in this paradigm. Some entities, including The New York Times and Vox Media (The Verge’s parent company), have obstructed OpenAI’s GPT crawlers’ access. Moreover, concerns have been raised by several companies regarding potential copyright infringements associated with training on their content. Notably, The New York Times initiated a copyright lawsuit against OpenAI and Microsoft, alleging that Microsoft’s Copilot and ChatGPT can generate content almost identical to theirs.

To circumvent these obstacles, AI firms are increasingly forging partnerships, a trend that has gained traction over the past year. The Associated Press and publishers like Axel Springer (parent company of Politico and Business Insider) have struck agreements with OpenAI to license stories for GPT-4 model training and news technology advancement.

Beyond Apple and OpenAI, Google has also shown interest in collaborating with media outlets. Reports indicate that Google piloted an AI tool named Genesis, which curates information and generates news pieces for professionals at The Washington Post, The Wall Street Journal, and The New York Times. Concurrently, certain news organizations have experimented with generative AI tools within their editorial workflows.

Visited 2 times, 1 visit(s) today

Tags: Latest news, OpenAI Last modified: February 29, 2024