Written by 4:33 am AI, Discussions

– Leveraging Student Data: University of Michigan’s AI Training Strategy

It is unclear whether those included in the datasets consented to having their audio and texts used…

The University of Michigan purportedly vended 85 hours of audio recordings from various academic settings, encompassing lectures, interviews, office hours, study groups, and student presentations, to external entities for artificial intelligence training. Additionally, there are allegations that the university sold a dataset of 829 academic papers from students to enhance large language models (LLMs).

The consent of individuals included in the data usage for training AI remains uncertain. A dataset obtained by The Daily Beast contained a lecture recording from 1999, indicating a lack of awareness among participants regarding the future utilization of their data for AI model training.

AI engineer Susan Zhang shared a screenshot on X displaying an advertisement from Catalyst Research Alliance, a company marketing the UM data, which she received on LinkedIn. The sender implied familiarity with Zhang’s work with LLMs, highlighting the availability of UM’s academic speech data and student papers for training LLMs.

In response to The Daily Beast, UM spokesperson Colleen Mastony refuted the sale of student data, clarifying that the papers and speech recordings were contributed voluntarily by student volunteers under signed consent for research studies conducted between 1997-2000 and 2006-2007. The involvement of Catalyst Research Alliance as a “third party vendor” and the students’ awareness of their data’s commercial use remain ambiguous.

Catalyst Research Alliance’s website indicates varying licensing costs for the datasets, reaching up to $25,000 for both audio recordings and papers. The datasets encompass a diverse range of academic settings and demographics, with recordings from lectures, discussions, interviews, and presentations.

The licensing agreement raises ethical concerns about the commodification of personal data to advance technologies like generative AI. The involvement of students, irrespective of their field, in training AI models using their voice and written work raises ethical dilemmas regarding privacy and consent.

While UM asserts that the papers and recordings have been freely available for academic use to enhance writing and communication skills, concerns persist regarding the privacy implications of leveraging student intellectual property without explicit consent or disclosure of personal information.

Vincent Conitzer, an AI ethics researcher at Carnegie Mellon University, expressed skepticism regarding the situation, emphasizing the need for transparency and ethical considerations in handling public domain recordings and papers for AI training purposes.

Visited 2 times, 1 visit(s) today
Tags: , Last modified: February 16, 2024
Close Search Window
Close