According to Apostol Vassilev, a computer scientist affiliated with the US National Institute of Standards and Technology (NIST), predictive and generative AI systems remain susceptible to a multitude of attacks, and individuals asserting the contrary are not entirely truthful.
Despite the notable progress in AI and machine learning, Vassilev emphasized that these technologies are still vulnerable to attacks that could lead to significant failures with serious consequences.
Vassilev noted that fundamental challenges in safeguarding AI algorithms have yet to be fully addressed, and any claims suggesting otherwise are akin to promoting deceptive solutions.
In an endeavor to categorize the security risks associated with AI systems, Vassilev collaborated on a paper with Alina Oprea from Northeastern University, Alie Fordyce, and Hyrum Anderson from Robust Intelligence, a security firm. The overall findings of their research do not paint an optimistic picture.
Aligned with the NIST Trustworthy AI initiative, which mirrors broader objectives of the US government in ensuring the safety of AI, the paper titled “Adversarial Machine Learning: A Classification and Terminology of Problems and Mitigations” was released in PDF format. Drawing on extensive market research spanning decades, the paper delves into various adversarial machine learning techniques.
The primary security concerns highlighted in the research encompass evasion, poisoning, privacy, and abuse attacks, all of which can impact fundamental or predictive models like ChatGPT or object recognition.
The report elucidates that evasion attacks aim to generate adversarial examples, denoting input samples whose classification can be altered at runtime to a choice favored by an attacker with minimal alterations.
For example, NIST proposes techniques to modify stop signs to evade correct identification by autonomous vehicle machine vision systems.
Additionally, poisoning attacks involve injecting malicious data into a machine learning model during training, leading to adverse model behavior upon encountering specific inputs. A 2020 Microsoft study cited in the report underscores poisoning attacks as a major concern among surveyed organizations.
Oprea highlighted that “Poisoning attacks, for instance, can be executed by manipulating a few hundred training samples, a small fraction of the overall training dataset.”
Privacy attacks, which involve extracting sensitive training data, eliminating memorized data, making inferences about protected information, and related intrusions, pose another significant threat.
Lastly, abuse attacks entail misusing AI systems to advance malicious objectives. Attackers can exploit conceptual AI models to facilitate harmful activities like generating content that incites violence or promotes hate speech against certain groups.
To aid AI practitioners in comprehending the challenges associated with training and deploying models, enhancing defense mechanisms, and suggesting mitigation strategies, the authors delineate various attack categories and their nuances.
The document concludes by underscoring the necessity for a balance between security, fairness, and accuracy in developing reliable AI systems. It posits that AI systems optimized solely for reliability may lack robustness against adversarial attacks and fairness considerations. Conversely, systems solely focused on adversarial robustness may sacrifice accuracy and fairness in their outcomes.