Reinforcement learning (RL) is one of the most promising branches of artificial intelligence (AI), and has potential to transform domains ranging from social media recommendations and traffic modeling to clinical health trials and electric power distribution. Yet RL systems can also introduce unintended risks, as they may be programmed to optimize for a particular result or value without accounting for potential harms that may inadvertently emerge.
Choices, Risks, and Reward Reports: Charting Public Policy for Reinforcement Learning Systems, a new report by a team of researchers affiliated with the UC Berkeley Center for Long-Term Cybersecurity’s Artificial Intelligence Security Initiative (AISI), examines potential benefits and challenges related to reinforcement learning, and provides recommendations to help policymakers ensure that RL-based systems are deployed safely and responsibly.
The report’s authors, all of whom have earned or are currently pursuing PhD degrees at UC Berkeley, are Thomas Krendl Gilbert, a postdoctoral fellow at the Digital Life Initiative at Cornell Tech; Nathan Lambert, a PhD student in the UC Berkeley Department of Electrical Engineering and Computer Sciences (EECS); Sarah Dean, Assistant Professor in the Computer Science Department at Cornell University; and Tom Zick, a researcher in AI ethics at the Berkman Klein Center for Internet and Society at Harvard University.
“Rather than allowing RL systems to unilaterally reshape human domains, policymakers need new mechanisms for the rule of reason, foreseeability, and interoperability that match the risks these systems pose,” the authors write. “Only in this way can design choices be structured according to terms that are well-framed, backed up by standards, and actionable in courts of law. We argue that criteria for these limits may be drawn from emerging subfields within antitrust, tort, and administrative law. It will then be possible for courts, federal and state agencies, and non-governmental organizations to play more active roles in RL specification and evaluation.”
Much of the attention on machine learning has focused on the potential for bias in results, but reinforcement learning introduces new categories of risk, as these systems are designed to teach themselves, based on pre-specified “rewards.”
“In machine learning, the primary risks have to do with outputs that a model generates (e.g., whether a model makes fair decisions),” the authors explain. “In RL, however, the risks come from the initial specification of the task (e.g., whether learning ‘good’ behavior for automated vehicles also requires an initial definition of good traffic flow). Addressing these risks will require ex ante design considerations, rather than exclusively ex post evaluations of behavior.”
Specifying these rewards incorrectly may cause the system to adopt behaviors and strategies that are risky or dangerous in particular situations. For example, a self-driving car may ignore pedestrians if it is only rewarded for not hitting other cars. On the other hand, a fleet of cars may learn to aggressively block merges onto certain highway lanes in the name of making them safe. If the RL system has not been set up to learn with feedback well, then the system could do great damage to the domain (in this case, public roadways) in which it operates.
The report proposes a novel type of documentation for RL-based systems called “Reward Reports,” which describe how the system is designed to behave. “This includes which types of feedback have been brought into scope, what metrics have been considered to optimize performance, and why components of the specification (e.g., states, actions, rewards) were deemed appropriate,” the report explains. “Reward Reports also outline the conditions for updating the report following ex post evaluation of system performance, the domain, or any interaction between these.”
The authors recommend that Reward Reports “become a component of external oversight and continuous monitoring of RL algorithms running in safety-critical domains,” as they can help ensure that RL-based systems can be externally evaluated, “enabling authorities to examine what a given system is optimizing for, and to evaluate the appropriateness of those terms. Reward Reports will ensure that design choices are auditable by third parties, contestable through litigation, and able to be affirmed by stakeholders.”
Potential audiences that could benefit from Rewards Reports include “trade and commerce regulators, standards-setting agencies and departments, and civil society organizations that evaluate unanticipated effects of AI systems,” the authors write. “Our broader vision is that, in safety-critical domains, the role of an algorithm designer matures to be closer to that of a civil engineer: a technical expert whose credentials and demonstrated skills are trusted to oversee critical social infrastructure, and are worthy of certification by public authorities. Moreover, agencies such as the National Institute of Standards and Technology (NIST) and the Department of Transportation (DOT) must facilitate and guide critical design criteria so that the public interest remains in scope for advanced automated systems.”