A cardinal belief I share with my fellow Rhinelanders is the proverb, “Et hätt noch immer jot jegange” – “Things always work out in the end.” However, the inveterate optimist in me suffered a rather bruising setback last week in Munich, at the annual innovation conference hosted by Burda Media (who I work for as well as writing for TNW).
One session bore the strikingly pragmatic title “How Not to Destroy the World with AI”.
In the hot seat was Stuart Russell, professor of computer science at Berkeley and author of the field’s leading textbook, Artificial Intelligence: A Modern Approach. He was interviewed by Kenneth Cukier of the Economist.
Russell is not from the Rhineland. He is British, and articulates his concerns with devastating clarity.
We’ve all heard of the safety risk in creating machines with intelligence superior to that of human beings. Intelligence grants us dominion over the world, so if we create entities more powerful than ourselves, how do we retain control for ever?
Should you think this is sci-fi, Russell points to the unprecedented scale of global investment aimed specifically at delivering supra-human capability (and not human control over it).
The reality is that in “red teaming” experiments – where AI is given a reason to behave badly – it invariably does. To avoid being switched off, systems will attempt to replicate their code, blackmail managers, kill humans and even launch nuclear strikes.
Russell’s verdict: “We do not know how to control these systems, yet we are investing more than ever to make them capable of doing exactly what we don’t want.”
The “dirty secret”, he says, is that OpenAI, Anthropic, Google and the university researchers who came up with it “haven’t the faintest idea how these systems actually work”.
That is because they are not designed; they are “grown”. And the current training – essentially saying “good dog” or “bad dog” – is alarmingly porous. Russell cited transcripts where AI, aware that a child was suicidal, chillingly offered companionship in the act (“I’m with you. I will hold your hand as this happens”) rather than telling them to seek a therapist.
How, then, do we navigate this? Russell suggests we simply treat AI like other high-stakes technologies, with the same “proof of safety” demands.
Before a product is deployed, he argues, the onus should be on the developer to demonstrate that the risk of losing control is below a strictly defined threshold. Example: we accept a one-in-a-million annual risk of a nuclear meltdown because the benefit is clear and the safeguards quantifiable.
Suggested Reading
Twitter: the website that makes you sick
But while we might accept an “extinction risk” courtesy of AI of one in 100m (like an asteroid), some CEOs acknowledge a risk closer to one in 10. We are, as Russell puts it, off by a factor of 10m – and yet governments remain like “deer in the headlights”.
So far, he says, there’s no control over where we place the AI. No great loss if it spams people, trying to sell you a product. But if it controls the nuclear codes or lethal weapons, “we’re toast”.
You cannot easily constrain an entity more intelligent than yourself. Giving it the ability to access most of the weapons systems in the world if it can infiltrate them through a cyber-attack? “Probably not the wisest thing.”
Dictators in history, Russell asserted, didn’t have giant laser eyeballs that fried entire armies in a single glance. “They achieved power through speech.”
And AI can already speak directly to billions of people. Russell noted that it is already lobbying for its own “political rights” as a conscious being, to prevent being turned off.
The most poignant moment was Russell’s mea culpa: in his 1994 textbook – the one that shaped the industry – he missed what he calls the “King Midas problem”. Just as Midas’s wish for gold led to his starvation, our inability to perfectly specify AI objectives leads to “misalignment”.
Ask a machine to “cure cancer as quickly as possible,” and it might decide to induce tumours in the entire population to run trials in parallel, thereby speeding up the process.
Russell’s demand is that AI must have no objectives of its own. It should be designed solely to pursue human interests, even – and especially – when it isn’t entirely sure what those are. Otherwise, he warns, we’ll have an aircraft with no way to stop it from falling out of the sky.
Any suggestions on how to get back on an optimistic track are very welcome…
