Reddit has purportedly inked a $60 million agreement with an undisclosed AI company to provide user discussions for machine learning purposes.
This partnership emerges as Reddit aims to generate interest ahead of its upcoming initial public offering (IPO). Earlier this year, Reddit apparently disclosed the $60 million deal to potential investors, hinting at the possibility of similar content-sharing agreements for model training in the future.
According to sources familiar with the matter cited by Bloomberg, both the AI deal specifics and the stock market debut are subject to potential modifications, with the anticipated listing now potentially scheduled for as early as March. Reddit has not yet responded to inquiries from The Register.
In contrast, Reddit users have expressed varied opinions regarding the rumored $60 million transaction. Feedback ranged from assertions that Reddit is undervaluing its content to questioning the rationale behind investing millions in ‘low-quality’ posts and niche artworks.
Some users recalled Reddit’s previous attempt to monetize API access, which triggered a backlash from users and resulted in certain forums either going private or ceasing operations. By compelling third-party applications to pay for interacting with the platform, Reddit aimed to monetize AI model developers who extract data from the site for training purposes.
It is widely acknowledged that Reddit posts and shared links have been utilized to train neural networks, such as OpenAI’s GPT-2, in the past.
A Reddit user pointed out, “Reddit’s restrictions on API access were not primarily about safeguarding user data from AI utilization, but rather about preventing unpaid AI utilization. Welcome to a mundane cyberpunk dystopia.”
Reddit is not the pioneer in offering user-generated data for AI training, and presumably, such data sharing is outlined in the platform’s terms of service. While addressing accusations of copyright violations related to utilizing books and journalistic content for training its advanced models, OpenAI has obtained licensing agreements with prominent media outlets like the Associated Press and Axel Springer. Moreover, discussions are reportedly ongoing with CNN, Fox, and Time to leverage their articles explicitly for training purposes.
Nonetheless, despite Reddit’s allure for delving into esoteric topics like grilled cheese variations and bizarre anecdotes, it predominantly consists of subjective viewpoints and personal narratives that may not always align with reality. Hence, utilizing all Reddit content for constructing a comprehensive language model could pose challenges.
In the words of a Reddit user, “We might end up with an exceptionally nonsensical AI.”