Researchers affiliated with the Center for Long-Term Cybersecurity have released a resource to help identify and mitigate the risks and potentially harmful impacts of general-purpose artificial intelligence (AI) systems (GPAIS) such as GPT-4 (the large language model used by ChatGPT and other applications) and DALL-E 3, which is used to generate images based on text prompts.
The AI Risk-Management Standards Profile for General-Purpose AI Systems (GPAIS) and Foundation Models (Version 1.0) is aimed primarily at developers of large-scale, state-of-the-art AI systems that “can provide many beneficial capabilities but also risks of adverse events with profound consequences,” the authors explain in the report’s abstract. “This document provides risk-management practices or controls for identifying, analyzing, and mitigating risks of GPAIS.”
The Profile was developed by Anthony M. Barrett, a researcher affiliated with UC Berkeley’s AI Security Initiative (AISI) at the UC Berkeley Center for Long-Term Cybersecurity, along with Jessica Newman, Director of the AISI; Brandie Nonnecke, Director of the CITRIS Policy Lab at UC Berkeley; Dan Hendrycks, a recent UC Berkeley PhD graduate; and Evan R. Murphy and Krystal Jackson, non-resident research fellows with the AISI.
The Profile is part of a growing body of resources intended to mitigate the risks of AI systems, which introduce novel privacy, security, and equity concerns and can be used for malicious purposes. Large-scale, cutting-edge GPAIS in particular have potential to behave unpredictably, manipulate or deceive humans in harmful ways, or lead to severe or catastrophic consequences for society. The Profile aims to ensure that developers of such systems take appropriate measures to anticipate and plan for a wide range of potential harms, from racial bias and environmental harms to destruction of critical infrastructure and degradation of democratic institutions.
The Profile is tailored to complement other AI risk management standards, such as the NIST AI Risk Management Framework (AI RMF), developed by the National Institute of Standards and Technology (NIST), and ISO/IEC 23894, developed by the International Organization for Standardization (ISO) and International Electrotechnical Commission (IEC). The Profile provides guidelines for GPAIS developers based on “core functions” defined in the NIST AI RMF: “Govern,” for AI risk management process policies, roles, and responsibilities; “Map,” for identifying AI risks in context; “Measure,” for rating AI trustworthiness characteristics; and “Manage,” for decisions on prioritizing, avoiding, mitigating, or accepting AI risks.
A Resource for Developers of GPAIS and Foundation Models
Other initial AI RMF profiles have seemed likely to focus on specific industry sectors and end-use applications, e.g., in critical infrastructure or other high-risk categories of the draft EU AI Act. While valuable for downstream developers of end-use applications, an approach focused on end-use applications could overlook an opportunity to provide profile guidance for upstream developers of increasingly general-purpose AI, including AI systems sometimes referred to as “foundation models.” Such AI systems can have many uses, and early-development risk issues such as emergent properties that upstream developers are often in a better position to address than downstream developers building on AI platforms for specific end-use applications.
“This document can provide GPAIS deployers, evaluators, and regulators with information useful for evaluating the extent to which developers of such AI systems have followed relevant best practices,” the authors write. “Widespread norms for using best practices such as in this Profile can help ensure developers of GPAIS can be competitive without compromising on practices for AI safety, security, accountability, and related issues.”
The guidance is for developers of large-scale GPAIS or “foundation models,” such as GPT-4, Claude 2, PaLM 2, LLaMA 2, among others, as well as “frontier models,” cutting-edge, state-of-the-art, or highly capable GPAIS or foundation models. The Profile was developed over the course of one year with extensive feedback from more than 100 participants in virtual workshops and interviews. Version 1.0 released today follows two earlier draft versions that were made publicly available for additional feedback.
The report’s appendices includes a “feasibility test,” in which the researchers applied the guidelines to four relatively large-scale foundation models — GPT-4, Claude 2, PaLM 2, and Llama 2 — based on publicly available information. The Berkeley GPAIS and foundation model profile effort is separate from, but aims to complement and inform the work of, other guidance development efforts such as the PAI Guidance for Safe Foundation Model Deployment and the NIST Generative AI Public Working Group.
“Ultimately, this Profile aims to help key actors in the value chains of increasingly general-purpose AI systems to achieve outcomes of maximizing benefits, and minimizing negative impacts, to individuals, communities, organizations, society, and the planet,” the authors write. “That includes protection of human rights, minimization of negative environmental impacts, and prevention of adverse events with systemic or catastrophic consequences at societal scale.”
For more information, email Tony Barrett at anthony.barrett@berkeley.edu.