On November 13, the Center for Long-Term Cybersecurity’s AI Security Initiative teamed up with the Center for Human-Compatible Artificial Intelligence (CHAI) to present a book talk featuring Stuart Russell, author of the new book Human Compatible: Artificial Intelligence and the Problem of Control.
A long-time member of the UC Berkeley faculty, Russell is Professor of Computer Science and Smith-Zadeh Professor in Engineering at UC Berkeley; he is also the co-author of Artificial Intelligence: A Modern Approach, one of the world’s most widely used textbooks on AI.
Held at the UC Berkeley Faculty Club, the “fireside chat” featured Professor Russell in conversation with Richard Waters, West Coast Editor for the Financial Times.
In his opening remarks, Waters noted there is a “schism” among AI experts about whether machines will ever achieve superhuman intelligence. “I’m a journalist and I love a schism,” Waters said. “Some argue there’s no race of superhuman robots on the horizon—or even possible.”
Russell dismissed such naysayers, noting, “If 20 cancer biologists did a summary of the state of cancer research and said a cure for cancer is not on the horizon and probably isn’t possible, you’d think, what on earth has made them say that?… What justification could there possibly be except a kind of denialism?”
In the conversation, Russell acknowledged that the complexity of developing AI will require significant advances: “We know the massive limitations of deep learning and data-driven models, and there’s this huge gulf to get from here to there,” Russell said. “It’s going to take big conceptual breakthroughs to get there.”
He noted that AI software that can already outperform a human in a task like playing chess would struggle with simple human behaviors like walking down the street. “Being able to understand a book and extract complex content from it would be a big step forward,” Russell said. “There’s a little problem of imagination failure when we think about AI systems: we say they’re not as smart as we are, and then that same morning, it will read everything the human race has ever written.”
Russell and Waters discussed the threats that could emerge if AI capabilities one day exceed those of humans. Russell argues that the problem is real and that the technical aspects of addressing the challenge are solvable if we replace current definitions of AI with a version based on provable benefit to humans.
“The way we’ve designed AI from the beginning has the property that the better you make the AI system, the worse it is for humanity,” Russell said. “The way we’ve always built AI is a copy of how we’ve thought human intelligence: they receive an objective and take actions to achieve that objective. But as we’ve always known, we’re unable to specify objectives correctly. This is the legend of King Midas, or why the third wish to the genie is always, ‘Please undo the first two wishes because I’ve completely ruined everything.’”
In the case of AI, Russell said, “we may not get a third wish. If you create a system that’s superintelligent and give it an objective and tell it to achieve that objective, you’re creating a chess match between us and machines. It’s a fundamental design error we made very early on in the field.”
Instead, Russell proposes that AI systems need to be developed with a built-in assumption that they do not know the objectives that are desired. “We need machines that know they don’t know what the true objective is,” he explained. “The true objective is the satisfaction of human preferences about the future. That’s what the machine should help us with, but it should know that it doesn’t know what our preferences are.”
The talk was recorded by C-Span2’s Book TV and is available to view on their website.