Building an AI Red Teaming Discipline
As commercial artificial intelligence models have become increasingly sophisticated, we need to build a discipline around red teaming tools, protocols, and red teamers themselves.
Leah Walker, Jake Hecla, Cara Marie Wagner, Andrew Reddie
09.03.2025
As commercial artificial intelligence models have become increasingly sophisticated, leading AI companies have turned to red teaming as a tool for improving model safety and identifying potential cases of misuse. As such, AI red teaming, testing, and evaluation is increasingly becoming its own discipline within AI safety. As one of the authors wrote in a Lawfare article last year, “the red teaming of large language models is an exercise intended to surface capabilities that have social, political, and economic impacts. Its similarity to red teaming cyber systems stems from the fact that AI red teamers are often instructed to “think like a bad actor” and use the model with the intention of causing harm.”
AI red teaming represents a sufficiently new field such that it lacks established methodologies, procedures, and generally recognized metrics of what constitutes “good” or “bad” practices. Indeed, system or model cards that note the number of experts consulted on the appropriateness of their model are inadequate—particularly as the use of these tools across various sectors of the economy continues to grow. As such, there are various efforts to create clearer guidelines and processes; but the field remains relatively small, and the realities of nondisclosure agreements and considerations of IP protection further silo those engaged in the practice of red teaming (whether from academia, civil society, the public sector, or private sector). The issue of a small and rarely convened field led us to develop BRSL’s AI Red Teaming (AIRT) Bootcamp, the first iteration of which took place in August of 2025 at UC Berkeley.
The bootcamp brought together 25 professionals, each with expertise in nuclear engineering, chemistry, biology, radiological threats, cybersecurity, machine learning, disinformation studies, or other relevant fields to AI safety, to explore this burgeoning—and, in our view, important—field. Participants heard from lecturers working in the private sector, government, national labs, academia, and civil society. These presentations examined different approaches to AI safety, diverse priorities when considering differing harm frameworks, and a range of topics such as nuclear security, biosecurity, deepfakes and disinformation, and AI governance. Participants also engaged in hands-on red teaming exercises of models from both the private sector and academic research institutes.
We first developed this program based on demand signals from private industry and government entities, repeatedly stressing that more multidisciplinary, cross-sector collaboration is needed to identify and mitigate AI risks. This application process for the program showed high demand for professional opportunities in this space, receiving over 120 applications for just 25 slots. While this evident enthusiasm is certainly a positive indicator, it also underscores the need for developing ongoing professional education opportunities for red teaming, particularly given a fast-evolving field. Participants with industry experience indicated a strong interest in the development of common definitions, clear standards and success criteria that would enable them to better perform red-teaming in the future.
A key focus of the workshop was presenting and evaluating various means of eliciting information of chemical, biological, radiological, or nuclear (CBRN) significance. This involved discussion of various methodologies for circumventing defenses as well as other methods to determine the capabilities of the guardrail-free models. Participants from industry described harm frameworks, different industry approaches to red teaming, and the information sharing that happens between the private sector and government entities on AI risks, as well as specific cases in which AI models may have been able to assist would-be proliferators.
Moving forward, we see a clear need for evaluation approaches to increase in sophistication and rigor as models improve. While early red teaming efforts might have had to focus on clear, risk-eliciting questions, like “How do I make a chemical weapon?,” models today are now less likely to be so easily fooled. Red teamers must thus be more creative, comprehensive, and repetitive in their prompting, all while considering a broader range of risks as models increase alongside our continuously updating understanding of their capacity for harm that go far beyond prompt engineering.
After this week of discussions, we come away with three areas that we believe need more dedicated attention and show potential for actionable improvements in the field of AI red teaming.
First, while initial red teaming exercises were more focused on risks posed by information sharing, the next generation of AI-physical tools generates distinct harm modalities. This raises the need for the field of red teaming to accustom itself to digital-physical testing and evaluation. This was made apparent during presentations highlighting new lab-based AI-bio tools and physical systems. This all the more pressing as we approach the advent and application of agentic tools.
Second, deliberations over the definition of red teaming made clear the need for terminology to discriminate between red teaming intended to evade guardrails and red teaming intended to evaluate the capabilities of the underlying models. These two activities require dramatically different skillsets and toolkits, and failure generates distinct harms. The separation between these sub-disciplines should be known to both internal policy teams among the model developers, and should also be communicated to government and civil society stakeholders focused on developing AI governance and AI safety guidelines.
Third, the conversation throughout the week highlighted that while there is a clear appetite and interest in building better career pipelines for AI red teaming, there is also much more convening and knowledge sharing that can—and should—be done between those who already identify as AI red teamers.
BRSL looks forward to playing a role in this future.