UC Berkeley AI Risk Management-Standards Profile for General-Purpose AI (GPAI) and Foundation Models

Update September 19th, 2024: V1.1 Draft of Profile available here for review and comment; comments by October 17th, 2024 would be most helpful!

Overview of Project

UC Berkeley researchers are leading an effort to update an AI risk management-standards profile for general-purpose AI (GPAI) and foundation models, such as cutting-edge large language models. The profile guidance is primarily for use by developers of such AI systems, in conjunction with the NIST AI Risk Management Framework (AI RMF) or the AI risk management standard ISO/IEC 23894. This profile is a contribution to standards on AI policy, safety, security, and ethics with risk-management practices or controls for identifying, analyzing, and mitigating risks of GPAI foundation models. 

We released Version 1.0 of the profile in November 2023, free online for anyone to use. We are continuing the project with a first annual update. We aim to publish Version 1.1 by the end of 2024. 

The V1.1 Draft of the Profile is available here. If you have feedback or would like to discuss the draft Profile, contact Tony Barrett (anthony.barrett@berkeley.edu).

We have  developed this profile in a multi-stakeholder process with input and feedback on drafts from over 100 people representing a range of stakeholders, including organizations developing large-scale GPAI/foundation models, and other organizations across industry, civil society, academia, and government. Our Berkeley GPAI/foundation model profile effort is separate from, but aims to complement and inform the work of, other guidance development efforts such as PAI’s Guidance for Safe Foundation Model Deployment and the NIST Generative AI Profile.

What’s New in the Draft Version 1.1?

This is a DRAFT of Version 1.1 of this document, to be released by the end of 2024. Changes since the Version 1.0 released November 2023 include:

  • Terminology and scope refinements throughout this document
    • Most notable is that most instances of “general purpose AI systems (GPAIS)” changed to “GPAI/foundation models”, to better reflect our greater relative focus on upstream GPAI models and foundation models, rather than downstream AI systems incorporating GPAI/foundation models
  • Added mappings to new regulations (e.g., the finalized EU AI Act) and commitments (e.g., the Frontier AI Safety Commitments), in Section 4
  • Additional resources for:
    • Red teaming and benchmark capability evaluations (Measure 1.1)
    • Transparency (Measure 2.9) and documentation (Measure 3.1)
    • Governance and policy tracking (Govern 1.1)
    • Training data audits (Manage 1.3, Measure 2.8)
    • Model weight protection (Measure 2.7)
  • Added actions and resources from the NIST Generative AI Profile, NIST AI 600-1, released July 2024 
  • Expansion on risks:
    • Manipulation and deception (Map 5.1)
    • Sandbagging during hazardous-capabilities evaluations (Govern 2.1, Map 5.1)
    • Situational awareness (Map 5.1)
    • Socioeconomic and labor market disruption (Map 5.1)
    • Possible intractability of removing backdoors (Map 5.1, Measure 2.7)
  • In Roadmap in Appendix 3, updates on issues to address in future versions of Profile:
    • Mechanistic interpretability
    • Advanced agentic AI
  • New foundation models used in retrospective testing of guidance, in Appendix 4
    • GPT-4o
    • Claude 3.5 Sonnet
    • Gemini 1.5
    • Llama 3.1

Purpose and Intended Audience

General-purpose AI (GPAI) and foundation models, such as GPT-4, Claude 3, PaLM 2, and LLaMA 2, can provide many beneficial capabilities, but they also introduce risks of adverse events with societal-scale consequences. This document provides risk-management practices or controls for identifying, analyzing, and mitigating risks of such AI models. We intend this document primarily for developers of these AI models; others that can benefit from this guidance include evaluators of such models, and downstream developers of end-use applications that build on a GPAI/foundation model. This document facilitates conformity with leading AI risk management standards and frameworks, adapting and building on the generic voluntary guidance in the NIST AI RMF and ISO/IEC 23894 AI risk management standard, with a focus on the unique issues faced by developers of GPAI/foundation models.

Examples of High Priority Guidance

The following is an excerpt from the V1.1 Draft Profile executive summary:

Users of this Profile should place high priority on the following risk management steps and corresponding Profile guidance sections:

  • Check or update, and incorporate, each of the following high-priority risk management steps when making go/no-go decisions, especially on whether to proceed on major stages or investments for development or deployment of cutting-edge large-scale GPAI/foundation models (Manage 1.1).
  • Take responsibility for risk assessment and risk management tasks for which your organization has substantially greater information and capability than others in the value chain (Section 3.1, Govern 2.1)
    • We also recommend applying this principle throughout other risk assessment and risk management steps, and we refer to it frequently in other guidance sections.
  • Set risk-tolerance thresholds to prevent unacceptable risks (Map 1.5)
    • For example, The NIST AI RMF 1.0 recommends the following: “In cases where an AI system presents unacceptable negative risk levels – such as where significant negative impacts are imminent, severe harms are actually occurring, or catastrophic risks are present – development and deployment should cease in a safe manner until risks can be sufficiently managed. [emphasis added]” (NIST 2023a, p.8) 
  • Identify the potential uses, and misuses or abuses for a GPAI, and identify reasonably foreseeable potential impacts (e.g., to fundamental rights) (Map 1.1)
  • Identify whether a GPAI could lead to significant, severe or catastrophic impacts, e.g., because of correlated failures or errors across high-stakes deployment domains, dangerous emergent behaviors, or harmful misuses and abuses by AI actors (Map 5.1)
  • Use red teams and adversarial testing as part of extensive interaction with GPAI to identify dangerous capabilities, vulnerabilities, or other emergent properties of such systems (Measure 1.1) 
  • Track important identified risks (e.g., vulnerabilities from data poisoning and other attacks or objectives mis-specification) even if they cannot yet be measured (Measure 1.1 and Measure 3.2)
  • Implement risk-reduction controls as appropriate throughout a GPAI lifecycle, e.g., independent auditing, incremental scale-up, red-teaming, structured access or staged release, and other steps (Manage 1.3, Manage 2.3, and Manage 2.4)  
  • Incorporate identified AI system risk factors, and circumstances that could result in impacts or harms, into reporting to internal and external stakeholders (e.g., to downstream developers, regulators, users, impacted communities, etc.) on the AI system as appropriate, e.g., using model cards, or system cards (Govern 4.2)
  • Check or update, and incorporate, each of the above when making go/no-go decisions, especially on whether to proceed on major stages or investments for development or deployment of cutting-edge large-scale GPAI (Manage 1.1) 

We also recommend: Document the process used in considering items, the options considered, and reasons for choices, including for guidance in Section 3 of this document. (Documentation on many items should be shared in publicly available material such as system cards. Some details on particular items such as security vulnerabilities can be responsibly omitted from public materials to reduce misuse potential, especially if available to auditors, Information Sharing and Analysis Organizations, or other parties as appropriate.)

GPAI/foundation model-related risk topics and corresponding guidance sections in this Profile document include the following. (Some of these topics overlap with others, in part because the guidance often involves iterative assessments for additional depth on issues identified at earlier stages.)

  • Reasonably foreseeable impacts (Section 3.2, Map 1.1), including:
    • To individuals, including impacts to health, safety, well-being, or fundamental rights
    • To groups, including populations vulnerable to disproportionate adverse impacts or harms
    • To society, including environmental impacts
  • Significant, severe, or catastrophic harm factors (Section 3.2, Map 5.1), including:
    • Correlated bias and discrimination
    • Impacts to societal trust or democratic processes
    • Correlated robustness failures
    • Potential for high-impact misuses, such as for cyber weapons, or chemical, biological, radiological, or nuclear (CBRN) weapons
    • Capability to manipulate or deceive humans in harmful ways
    • Loss of understanding and control of an AI system in a real world context
  • AI trustworthiness characteristics (Section 3.4, Measure 2.x), including:
    • Safety, reliability, and robustness (Measure 2.5, Measure 2.6)
    • Security and resiliency (Measure 2.7)
    • Accountability and transparency (Measure 2.8)
    • Explainability and interpretability (Measure 2.9)
    • Privacy (Measure 2.10)
    • Fairness and bias (Measure 2.11)

Additional topics to address in future versions of the Profile are listed in Appendix 3.

Widespread norms for using best practices such as in this Profile can help ensure developers of increasingly general-purpose AI systems can be competitive without compromising on practices for AI safety, security, accountability, and related issues.

Milestones

We are proceeding with the following profile-update stages and approximate dates:

  • Draft V1.1 Profile, retrospective testing, and Quick Guide publicly available – Q3 2024
  • Release Profile V1.1 on UC Berkeley Center for Long-Term Cybersecurity website – Q4 2024

We are also planning a second annual update, to release Profile V1.2 by Q4 2025.

Project Leads:

Anthony M. Barrett, Ph.D., PMP
Visiting Scholar, AI Security Initiative, Center for Long-Term Cybersecurity, UC Berkeley
anthony.barrett@berkeley.edu 

Jessica Newman
Director, AI Security Initiative, Center for Long-Term Cybersecurity, UC Berkeley
Co-Director, AI Policy Hub, UC Berkeley

Brandie Nonnecke, Ph.D.
Director, CITRIS Policy Lab, UC Berkeley
Co-Director, AI Policy Hub, UC Berkeley

Available Drafts or Versions

The following is our most recent publicly available draft Profile:

Questions for readers of the draft Version 1.1 Profile:

  1. What should we add regarding best practices, resources, standards, evaluations, or benchmarks for developers of GPAI or foundation models?
  2. Are there substantive errors or omissions in the example applications of our draft Profile guidance to GPT-4o, Claude 3.5 Sonnet, Gemini 1.5, and Llama 3.1 in Appendix 4?
  3. What other key items are missing from this first annual update of the Profile? 
  4. What changes, if any, should we make to our terminology and scope?
  5. Are any other changes necessary for a Final Version 1.1 release of the Profile planned for December 2024?

Questions for readers of the draft Quick Guide:

  1. Is the Quick Guide a useful way to quickly review recommendations from the Profile?
  2. How can this be modified to help make the profile as actionable as possible?
  3. Would another name for this (e.g., Starter Kit or Cheat Sheet) be more appropriate?

Please send input or feedback to anthony.barrett@berkeley.edu or jessica.newman@berkeley.edu. If you would rather not be listed in the Acknowledgements section of future drafts or versions of the Profile as someone that provided input or feedback, please let us know that when you provide us your comments.