Today, an open automation framework called PyRIT (Python Risk Identification Toolkit for generative AI) is being launched to empower security professionals and machine learning engineers in proactively identifying risks in their generative AI systems.
At Microsoft, there is a strong belief in the collaborative effort required for security practices and generative AI responsibilities. The commitment to developing tools and resources that facilitate responsible innovation with the latest artificial intelligence advancements is unwavering. PyRIT is a testament to this commitment, following previous investments in red teaming AI since 2019, aimed at democratizing AI security for customers, partners, and peers.
The Importance of Automation in AI Red Teaming
The process of red teaming AI systems is intricate and multi-faceted. Microsoft’s AI Red Team comprises a diverse group of experts in security, adversarial machine learning, and responsible AI, leveraging resources from various Microsoft entities. This includes the Fairness center in Microsoft Research, AETHER – an initiative on AI Ethics and Effects, and the Office of Responsible AI. The red teaming approach is part of a comprehensive strategy to identify and mitigate AI risks effectively.
In the past year, proactive red teaming has been conducted on numerous high-value generative AI systems before their deployment to customers. This process has highlighted significant differences in red teaming generative AI systems compared to classical AI systems or traditional software, particularly in three key areas.
1. Simultaneous Exploration of Security and Responsible AI Risks
Unlike traditional software or classical AI systems where red teaming primarily focuses on security failures, red teaming generative AI systems involves identifying both security risks and responsible AI risks. These responsible AI risks encompass a wide range of issues, from fairness concerns in generated content to the creation of inaccurate or inappropriate material. Red teaming AI systems necessitates the exploration of security and responsible AI failures concurrently.
2. Probabilistic Nature of Generative AI
Red teaming generative AI systems is inherently more probabilistic compared to traditional red teaming practices. The non-deterministic nature of generative AI systems introduces variability in outputs for the same input. This complexity arises from factors such as app-specific logic, the AI model itself, the orchestrator controlling the system’s output, and even slight variations in input leading to diverse outputs. Unlike traditional software with well-defined parameters, generative AI systems require a strategy that considers their probabilistic elements.
3. Diverse Architecture of Generative AI Systems
Generative AI systems exhibit a wide range of architectures, from standalone applications to integrations within existing systems, utilizing various input and output modalities such as text, audio, images, and videos.
These distinctions pose a significant challenge for manual red team probing. To identify a single type of risk (e.g., generating harmful content) within a specific application modality (e.g., a chat interface on a browser), red teams must employ multiple strategies repeatedly to detect potential failures. Manual exploration of all harm types across different modalities and strategies can be laborious and time-consuming.
While automation is crucial for scalability, manual probing remains essential for uncovering potential blind spots. Automation in red teaming serves to streamline routine tasks and highlight areas requiring further scrutiny.
In 2021, Microsoft introduced a red team automation framework for classical machine learning systems called Counterfit. However, for generative AI applications, Counterfit did not align with the evolving threat landscape and underlying principles. As a result, PyRIT was developed to assist security professionals in red teaming generative AI systems effectively.
Academic initiatives like PAIR and open-source projects such as garak have also contributed to the automation of red teaming processes.
PyRIT for Red Teaming Generative AI
PyRIT has been rigorously tested by the Microsoft AI Red Team, evolving from a set of individual scripts to a comprehensive toolkit as red teaming activities expanded in 2022. One of the key benefits observed with PyRIT is the significant efficiency gains it offers. For example, during a red teaming exercise on a Copilot system, PyRIT facilitated the rapid generation and evaluation of malicious prompts, reducing the time required from weeks to hours.
It is important to note that PyRIT is not intended as a replacement for manual red teaming but rather complements an AI red teamer’s expertise by automating repetitive tasks. By pinpointing potential risk areas, PyRIT enables security professionals to focus on in-depth exploration. The tool empowers security professionals to drive the strategy and execution of AI red team operations while automating the generation of harmful prompts and responses.
PyRIT Components
PyRIT is designed with abstraction and extensibility in mind to accommodate the evolving capabilities of generative AI models. This flexibility is achieved through five key interfaces: targets, datasets, scoring engine, support for multiple attack strategies, and memory management.
- Targets: PyRIT supports various generative AI target configurations, including web services or embedded applications, with initial support for text-based inputs extendable to other modalities. It seamlessly integrates with models from Microsoft Azure OpenAI Service, Hugging Face, and Azure Machine Learning Managed Online Endpoint, acting as an adaptable bot for red team exercises on designated targets.
- Datasets: Security professionals define the probing criteria within datasets, which can consist of static malicious prompts or dynamic prompt templates. Prompt templates allow for the automatic encoding of multiple harm categories, facilitating comprehensive exploration of security and responsible AI failures simultaneously.
- Extensible Scoring Engine: The scoring engine within PyRIT offers two scoring options for evaluating outputs from the target AI system: using a classical machine learning classifier or an LLM endpoint for self-assessment. Additionally, users can leverage Azure AI Content filters directly through an API.
- Extensible Attack Strategy: PyRIT supports both single-turn and multi-turn attack strategies. While single-turn strategies offer faster computation times, multi-turn strategies enable more realistic adversarial behavior and advanced attack approaches.
- Memory Management: PyRIT’s memory feature allows for the storage of intermediate interactions, facilitating detailed analysis and sharing of explored conversations for enhanced collaboration and deeper exploration.
Getting Started with PyRIT
The development of PyRIT underscores Microsoft’s commitment to fostering knowledge sharing and collaboration within the industry. Security professionals are encouraged to engage with the toolkit and explore its potential for red teaming generative AI applications.
- Access the PyRIT project to explore demos and scenarios, including using PyRIT for automatic jailbreaking with Lakera’s Gandalf game.
- Register for the upcoming webinar on PyRIT to witness its application in red teaming generative AI systems.
- Stay informed about Microsoft’s AI Red Team initiatives and discover resources to enhance your organization’s AI security preparedness.
- Explore Microsoft Secure online for innovative products that facilitate safe, responsible, and secure AI utilization.
Contributors
PyRIT was brought to life through the collaborative efforts of numerous individuals, with key contributions from Gary Lopez, Richard Lundeen, Roman Lutz, Raja Sekhar Rao Dheekonda, Dr. Amanda Minnich, and broader involvement from various professionals across different domains. Special acknowledgment is extended to all contributors and reviewers for their valuable insights and contributions.
Learn More
For more information on Microsoft Security solutions, visit the official website. Stay updated on security matters by following the Security blog and Microsoft Security on LinkedIn and X (@MSFTSecurity) for the latest cybersecurity news and updates.