“When utilizing a service at no cost, you essentially become the commodity.”
Reddit’s Ripple Effect
Beneath the current frenzy of excitement and investment in the artificial intelligence (AI) industry lies a fundamental scarcity: data. This data, generated by individuals in a traditional manner, is essential for training the massive models such as ChatGPT and DALL-E, which are responsible for producing text and images.
The increasing demand for this data has led to various controversies, including lawsuits filed by authors and news outlets alleging unauthorized utilization of their content by AI companies. Additionally, there is a looming concern regarding the saturation of the internet with AI-generated material and the potential necessity for AI creators to rely on such content for future training purposes.
Moreover, this demand is driving new business agreements as AI developers hasten to secure repositories of human-generated content to enhance the training of their AI systems. A notable example is the recent revelation from Bloomberg that an undisclosed AI entity has committed to paying Reddit a staggering $60 million annually for access to its extensive database of user-generated posts. This deal underscores the pivotal role of user data as the primary commodity in the AI industry’s rapid expansion.
Moderation Maneuvers
This isn’t the first instance of an AI company investing in access to a trove of textual data. Recall when Axel Springer, the parent company of publications like Politico and Business Insider, struck a deal with OpenAI to incorporate its publications’ content into ChatGPT?
However, there are notable distinctions in this recent transaction. Notably, journalists are compensated for their contributions, even if they may not directly benefit—and could potentially suffer—from the integration of their work into AI systems. Contrastingly, Redditors have freely provided a substantial volume of content out of passion, making it disheartening when this content is exploited for financial gain.
In response to the news, one Redditor humorously questioned, “Where is my share of the pie?” while another succinctly emphasized, “When you use something for free, you are the product.”
Furthermore, the lack of transparency regarding the entity financing this substantial data acquisition, amounting to $60 million annually, adds another layer of intrigue.
Expressing discontent with the situation, another Redditor critiqued the sale of public forum users’ data for training AI models without compensating the users—a sentiment echoing the evasive responses often witnessed from systems like ChatGPT when confronted with controversial inquiries.