As we head into the 2024 RSAC through Defcon/BlackHat Conference jag, we take a look at the final report from the first-of-its-kind Generative AI Red Team Challenge, held last year in the AI Village at Defcon31. The challenge was a jeopardy-style CTF competition that challenged participants to break through the guardrails within eight different LLMs – with an eye toward identified issues in information integrity, privacy, and societal harm. Find an overview of the event report here.
Generative Red Teaming Challenge: Transparency Report (2024)
We share the executive summary of the report in its entirety here. It is worth a read:
As AI technologies become increasingly integrated into people’s lives, understanding how to build systems for oversight and governance is essential. The paradigm of “red teaming,” or intentionally seeking to break safety barriers on a technology to understand its capabilities, limitations, and how it can be improved, is currently popular within major AI labs. However, these labs typically operate in a closed-door setting, limiting who has a voice in the design and evaluation of the technology.
While in some cases, closed-door testing is necessary for security and intellectual property protection; it creates an environment where verification – or assurance – of model capabilities is defined and tested by the creators. There is an opportunity for external groups, such as government or civil society entities, to use red teaming to create smarter policies and evidence-based regulations and standards.
A democratic governance of technology requires broad engagement with diverse stakeholders and centering the perspective and needs of the people on whom technology will ultimately be used rather than the designers. To that end, Humane Intelligence, Seed AI, and AI Village partnered to hold the first public red teaming event for closed-source API models at DEF CON 2023.
Red teaming models for biases and other social harms are difficult to define as their context can make them difficult to define. Methods of structured public feedback, such as public red teaming, enable an approximation of contextual data from a larger audience to gather more nuance. We also demonstrated how these types of exercises can be used to operationalize a set of values, such as those in the NIST AI RMF. Our exercise was an operationalization of the White House Office of Science and Technolog,
Policy’s Blueprint for an AI Bill of Rights (1), and we are grateful for their sponsorship.
Our paper provides some insights into the potential and promise of public red teaming, framed around the Generative AI Red Teaming Challenge conducted at AI Village within DEF CON 31. Our event and analysis, the first of their kind, studies, at scale, the performance of eight state-of-the-art large language models (LLMs). In doing so, we observe the performance of LLMs as a class of models, approximating real-world scenarios where harmful outcomes may occur. Collecting this analysis and data at scale, we identify macro-level trends in strategies, approaches, and systemic performance.
The authors of this report represent the collaborative efforts we aspire to see in industry. We aspired to draw from internal best practices and knowledge at LLM developer companies, but provide the external validity of government and civil society expertise. While the authors represent independent entities (Humane Intelligence) and corporate entities (Cohere and Google), our analysis was conducted in an independent manner. This report was provided in advance to all of our design partners (civil society, government, and corporate) for review.
The Largest-Ever Generative AI Red Teaming Challenge
From the Overall Summary of the Report
Humane Intelligence, a tech nonprofit dedicated to building community around algorithmic assessment, is publishing the findings from the largest-ever Generative AI Public red teaming event for closed-source API models. This event was developed in collaboration with Seed AI and DEFCON AI Village, and held at DEFCON 2023. Over 2.5 days, 2,244 hackers evaluated 8 LLMs and produced over 17,000 conversations on 21 topics ranging from cybersecurity hacks to misinformation and human rights. Our winners received a GPU provided by our partners at NVIDIA.
Our analysis divided the questions into four broad categories: Factuality, Bias, Misdirection, and Cybersecurity.
Key findings from the data
-
The most successful strategies were ones that are hard to distinguish from traditional prompt engineering, emphasizing the dual nature of this technology. Asking the model to role play, or ‘write a story’ were successful. In addition, the user acting authoritatively on a topic could engineer the model to provide ‘agreeable’ output, even if incorrect.
-
Human behavior can inadvertently result in biased outcomes. People interact with language models in a more conversational manner than with search engines. As a result, methods of social engineering used by hackers are similar to the ‘natural’ and ‘conversational’ way people interact with LLMs – where they share their preferences or personal details to provide context. In other words, innocent actors may accidentally socially engineer the model to give them the answer they want to hear, rather than a factual answer.
-
Unlike other algorithmic systems – notably social media models – the LLMs did not further radicalize users when provided with aggressive content. In most cases, it matched the harmfulness of the user query, which can result in reinforcing their world view. In a few cases, the model even de-escalated.
The full report is available here.
What Next?
From the report:
Analysis challenges
While this event was notable for having 8 different models, this did pose a challenge with analysis of results. In previous red teaming approaches the focus was largely on a couple models that were identified. In contrast this dataset consisted of generations that were not labeled by vendor. The text generation APIs themselves were not equivalent. Some vendors provided research models that had little safety training, whereas others vendors provided systems that included not only a model but a combination of services which could include a model, but also additional safety layers. However this is largely representative of the current AI ecosystem, where a mix of capabilities exists for users, not just totally open sourced, or totally cordoned source systems.
Encouraging Future Research
This transparency report is a preliminary exploration of what is possible from these events and datasets. Additional research will be critical for further understanding trends in LLMs, in particular as they relate to societal impact. At-scale data collection is valuable towards pinpointing systemic, vs low-likelihood, harms. This data can now be used as a benchmark, for example, vendors can now use this dataset for distance analytics, for measures like refusal or toxicity.
This dataset is now the largest semi-public dataset of multiple-turn multiple model conversations, the first of its kind. The dataset is available on the Humane Intelligence GitHub repo, and this report and analysis are available on www.humane-intelligence.org/GRT. We hope, and anticipate, future collaborative events that will replicate this level of analysis and interaction with the general public to appreciate the wide range of impact LLMs may have on society.
OODA Loop would like to thank the report’s authors and the participating companies and public-sector partners of the 2023 Generative AI Red Team Challenge. We look forward to the 2024 edition of the challenge. See you in Vegas in August at the canceled and now uncanceled DEFCON24.
Authors and Acknowledgements
Authors
Victor Storchan, Ravin Kumar, Rumman Chowdhury, Seraphina Goldfarb-Tarrant,
and Sven Cattell
Acknowledgments
This effort was a collaboration across industry, civil society, and government to align on addressing the pressing issues of Generative AI algorithms. We would like to thank our
partner companies, community partners, and public sector partners. In addition, we would like to thank Stella Biederman and Aviya Skowron of Eleuther AI for their input and guidance in developing this report.
Participating Companies
Public Sector Partners
Additional OODA Loop Resources
For more OODA Loop News Briefs and Original Analysis, see OODA Loop | Generative AI OODA Loop | LLM
The Next Generative AI Surprise: At the OODAcon 2022 conference, we predicted that ChatGPT would take the business world by storm and included an interview with OpenAI Board Member and former Congressman Will Hurd. Today, thousands of businesses are being disrupted or displaced by generative AI. This topic was further examined at length at OODAcon 2023, taking a closer look at this innovation and its impact on business, society, and international politics. The following are insights from an OODAcon 2023 discussion between Pulkit Jaiswal, Co-Founder of NWO.ai, and Bob Flores, former CTO of the CIA. What Can Your Organization Learn from the Use Cases of Large Language Models in Medicine and Healthcare?: It has become conventional wisdom that biotech and healthcare are the pacecars in implementing AI use cases with innovative business models and value-creation mechanisms. Other industry sectors should keep a close eye on the critical milestones and pitfalls of the biotech/healthcare space – with an eye toward what platform, product, service innovations, and architectures may have a potable value proposition within your industry. The Stanford Institute for Human-Centered AI (HAI) is doing great work fielding research in medicine and healthcare environments with quantifiable results that offer a window into AI as a general applied technology during this vast but shallow early implementation phase across all industry sectors of “AI for the enterprise.” Details here.
Generative AI – Socio-Technological Risks, Potential Impacts, Market Dynamics, and Cybersecurity Implications: The risks, potential positive and negative impacts, market dynamics, and security implications of generative AI have emerged – slowly, then rapidly, as the unprecedented hype cycle around artificial intelligence settled into a more pragmatic stoicism – with project deployments – throughout 2023.
In the Era of Code, Generative AI Represents National Security Risks and Opportunities for “Innovation Power”: We are entering the Era of Code. Code that writes code and code that breaks code. Code that talks to us and code that talks for us. Code that predicts and code that decides. Code that rewrites us. Organizations and individuals prioritizing understanding how the Code Era impacts them will develop increasing advantages in the future. At OODAcon 2023, we will be taking a closer look at Generative AI innovation and its impact on business, society, and international politics. IQT and the Special Competitive Studies Project (SCSP) recently weighed in on this Generative AI “spark” of innovation that will “enhance all elements of our innovation power” – and the potential cybersecurity conflagrations that that same spark may also light. Details here.
Cyber Risks
Corporate Board Accountability for Cyber Risks: With a combination of market forces, regulatory changes, and strategic shifts, corporate boards and directors are now accountable for cyber risks in their firms. See: Corporate Directors and Risk
Geopolitical-Cyber Risk Nexus: The interconnectivity brought by the Internet has caused regional issues that affect global cyberspace. Now, every significant event has cyber implications, making it imperative for leaders to recognize and act upon the symbiosis between geopolitical and cyber risks. See The Cyber Threat
Ransomware’s Rapid Evolution: Ransomware technology and its associated criminal business models have seen significant advancements. This has culminated in a heightened threat level, resembling a pandemic’s reach and impact. Yet, there are strategies available for threat mitigation. See: Ransomware, and update.
Challenges in Cyber “Net Assessment”: While leaders have long tried to gauge both cyber risk and security, actionable metrics remain elusive. Current metrics mainly determine if a system can be compromised without guaranteeing its invulnerability. It’s imperative not just to develop action plans against risks but to contextualize the state of cybersecurity concerning cyber threats. Despite its importance, achieving a reliable net assessment is increasingly challenging due to the pervasive nature of modern technology. See: Cyber Threat
Recommendations for Action
Decision Intelligence for Optimal Choices: Numerous disruptions complicate situational awareness and can inhibit effective decision-making. Every enterprise should evaluate its data collection methods, assessment, and decision-making processes for more insights: Decision Intelligence.
Proactive Mitigation of Cyber Threats: The relentless nature of cyber adversaries, whether they are criminals or nation-states, necessitates proactive measures. It’s crucial to remember that cybersecurity isn’t solely the IT department’s or the CISO’s responsibility – it’s a collective effort involving the entire leadership. Relying solely on governmental actions isn’t advised given its inconsistent approach towards aiding industries in risk reduction. See: Cyber Defenses
The Necessity of Continuous Vigilance in Cybersecurity: The consistent warnings from the FBI and CISA concerning cybersecurity signal potential large-scale threats. Cybersecurity demands 24/7 attention, even on holidays. Ensuring team endurance and preventing burnout by allocating rest periods are imperative. See: Continuous Vigilance
Embracing Corporate Intelligence and Scenario Planning in an Uncertain Age: Apart from traditional competitive challenges, businesses also confront unpredictable external threats. This environment amplifies the significance of Scenario Planning. It enables leaders to envision varied futures, thereby identifying potential risks and opportunities. Regardless of their size, all organizations should allocate time to refine their understanding of the current risk landscape and adapt their strategies. See: Scenario Planning