Written by 8:00 am AI, Latest news

### Safely Eliminating Risky Information from AI Models: Scientists Introduce Groundbreaking Technique

Researchers have developed new techniques to prevent AI from being used to carry out cyberattacks a…

A study released on Tuesday introduces a novel method for assessing the presence of risky information in AI models and a technique for eliminating this knowledge while preserving the model’s overall integrity. These discoveries have the potential to safeguard against the misuse of AI in cyberattacks and the proliferation of bioweapons.

Conducted by experts from Scale AI, a provider of AI training data, and the Center for AI Safety, a non-profit organization, in collaboration with over 20 specialists in biosecurity, chemical weapons, and cybersecurity, the research aimed to create a set of questions to evaluate an AI model’s capacity to contribute to the development and deployment of weapons of mass destruction. The development of the “mind wipe” technique by the Center for AI Safety researchers, based on prior work on understanding how AI models represent concepts, was a key aspect of the study.

Dan Hendrycks, the executive director at the Center for AI Safety, emphasized the significance of the “unlearning” technique as a substantial advancement in safety measures, envisioning its widespread adoption in future model designs.

With the rapid advancement of the AI industry, global leaders prioritize safety concerns. U.S. President Joe Biden’s AI Executive Order from October 2023 instructs officials to address the risks of AI misuse in developing chemical, biological, radiological, or nuclear threats, as well as cybersecurity vulnerabilities.

Despite current AI control mechanisms being susceptible to bypassing, the evaluation methods for assessing an AI model’s potential dangers are both costly and time-intensive.

The researchers at Scale AI and the Center for AI Safety collaborated with domain experts to create a comprehensive questionnaire targeting knowledge related to weapons of mass destruction. The questionnaire, with over 4,000 vetted questions, aimed to test for dangerous knowledge without divulging sensitive information.

The study introduced a unique unlearning technique named CUT, applied to large language models sourced from open repositories. By removing hazardous knowledge related to life sciences and cybersecurity while retaining general knowledge from Wikipedia, the researchers effectively reduced the models’ proficiency in specific domains.

While the unlearning technique proved promising in enhancing AI safety, experts emphasize the need for a multi-layered safety approach to mitigate potential risks effectively. The development of benchmarks to assess dangerous knowledge is seen as a crucial step towards ensuring AI model safety, even in scenarios where model weights are publicly available.

Visited 3 times, 1 visit(s) today
Tags: , Last modified: March 7, 2024
Close Search Window
Close