January has already seen the release of new frontier models that are creating waves with unprecedented innovation in their development and uplift in their capabilities. Simultaneously, we are seeing the emergence of a new strategy to govern AI from the recently elected Trump administration in the US. These developments, combined with Big Tech’s growing leadership in the AI ecosystem, set the stage for this Action Summit, to be a pivotal occasion to catalyze important shifts in AI governance.
Against this backdrop, we tune our focus to an important component of the Frontier AI Safety Commitments signed by 16 global AI industry organizations at the AI Seoul Summit in May 2024. As part of the commitments to publish their efforts to measure and manage risks posed by their frontier AI models, the organizations committed to defining, “thresholds at which severe risks posed by a model or system, unless adequately mitigated, would be deemed intolerable.“ The same summit also saw 27 nations and the EU announce their intent to define these thresholds leading up to the Paris AI Action Summit.
A January 2025 consultation report from the Future Society and its partners, (which also borrows from our draft working paper), reiterates the need for stakeholders at the upcoming AI Action Summit in Paris to supplement the voluntary commitments from industry with concrete thresholds. In the absence of clear guidance from regulators, academics, or civil society that places a high priority on protecting public safety, companies may face incentives to develop thresholds that are low-cost for them to implement but do not provide adequate levels of public safety.
The AI Security Initiative at CLTC is working to help bridge this gap and puts forward this paper as a starting point or supplementary resource for developers and regulators to use in their own deliberations to begin timely efforts to implement policies that prevent intolerable risks from ever occurring (ex-ante) rather than merely implementing safeguards in response to their occurrence (ex-post). Through this work we hope to advocate for “good, not perfect” thresholds and to err on the side of safety in the face of uncertainty and limited available data.
Through this paper, we present recommendations for organizations and governments engaged in establishing thresholds for intolerable AI risks. We include model harm arising from the risk categories of CBRN weapons, cyber attacks, model autonomy, persuasion and manipulation, deception, toxicity, discrimination and socioeconomic disruption in the scope of our discussion. In Table 1 of the paper, we specify the outcomes of concern related to these risk categories, evidence of the risks materializing, and intolerable risk threshold recommendations for each category.
Our key recommendations include:
- Design thresholds with adequate margins of safety to accommodate uncertainties in risk estimation and mitigation.
- Evaluate dual-use capabilities and other capability metrics, capability interactions, and model interactions through benchmarks, red team evaluations, and other best practices.
- Identify “minimal” and “substantial” increases in risk by comparing to appropriate base cases.
- Quantify the impact and likelihood of risks by identifying the types of harms and modeling the severity of their impacts.
- Supplement risk estimation exercises with qualitative approaches to impact assessment.
- Calibrate uncertainties and identify intolerable levels of risk by mapping the likelihood of intolerable outcomes to the potential levels of severity.
- Establish thresholds through multi-stakeholder deliberations and incentivize compliance through an affirmative safety approach.
This paper is informed by the thoughtful comments we received on the draft working paper that was published in November 2024. Apart from email submissions, we also organized an in-person roundtable at the UC Berkeley campus in November and a virtual workshop in December last year, which saw participation from experts in academia, industry, civil society, and government. We are grateful to all our participants for their expert feedback and insights.


(left) Participants from a December 2024 virtual workshop.
The paper has also been accepted for presentation at the inaugural IASEAI 2025 conference in Paris on the 6th and 7th of February, 2025, which has been set up to be an important showcase that amplifies critical AI policy considerations in the lead up to the Paris AI Action Summit 2025. The Summit will host heads of state, leaders of international organizations, CEOs of small and large companies, representatives of academia, non-governmental organizations, artists and members of civil society on the 10th & 11th of February, 2025.
For questions or comments please contact Deepika Raman at deepika.raman(at)berkeley.edu.
Version History
For version history and comparison, following are earlier publicly available draft documents:
- Working Paper on Intolerable Risk Thresholds for AI (November 18, 2024) – Download PDF