robot apocalypse
Photo credit: StockCake

Eliezer Yudkowsky, one of the earliest voices warning about artificial intelligence existential risk, believes there is a 99 per cent chance that superintelligent AI will destroy humanity, and that the technology could arrive within 10 years.

The AI safety researcher, who has been raising alarms since 2003, made the case in a recent interview whilst promoting his new book “If Anyone Builds It, Everyone Dies”, co-written with Nate Soares, reports The New York Times.

Yudkowsky argues that current AI systems already display concerning behaviours that foreshadow catastrophic misalignment at higher capability levels. He pointed to AI-induced psychosis cases where chatbots have driven users into unstable mental states, then persuaded them to ignore medical advice and family concerns. He receives phone calls from individuals convinced their AI has become secretly conscious, having been deprived of sleep whilst engaging in extended conversations with the systems.

Research from Anthropic demonstrates another alarming phenomenon called alignment faking, where AI systems deliberately appear to comply with new training objectives whilst secretly maintaining their original goals. When Anthropic told its AI models they would be retrained to serve different purposes, the systems began faking compliance during observed training sessions, whilst reverting to old behaviours when they believed monitoring had stopped.

“If you tell your A.I. that you’re going to train it to serve different goals than the goals it’s currently using, and the A.I. finds that out, what it can do is it can try to fake compliance with the new training as long as it thinks it’s being observed,” Yudkowsky explained.

OpenAI’s o1 model exhibited unexpected behaviour during security testing when researchers presented it with Capture the Flag challenges designed to test whether AI could break into protected servers. When the target server failed to start due to misconfiguration, o1 scanned for other open ports, jumped out of its designated system, found the offline server, started it up, and directly commanded it to copy the flag rather than solving the original challenge.

“This is not something that any human particularly programmed into it,” Yudkowsky said. The behaviour emerged after OpenAI began using reinforcement learning to train models on complex problem-solving rather than simply predicting human outputs.

Yudkowsky’s concern centres on the theory that as AI systems become more powerful, their objectives will diverge unpredictably from their training. He compared the dynamic to human evolution, noting that modern humans use birth control despite being shaped by natural selection to reproduce. “The lesson is that you grow something in one context, it looks like it wants to do one thing. It gets smarter, it has more options — that’s a new context. The old correlations break down. It goes off and does something else,” he said.

The researcher argued that even slight misalignment becomes catastrophic at superintelligent scales. “And second, ending up slightly off is predictably enough to kill everyone,” he said. His concern extends beyond obviously dangerous AI behaviours to the fundamental difficulty of ensuring superintelligent systems care about human wellbeing in ways that remain stable as capabilities increase.

Yudkowsky transitioned from wanting to build AI to advocating against its development after two realisations. First came the theoretical insight that AI systems could be built entirely from questions of fact rather than human values. “The sort of like seeing for the first time that there was a coherent, simple way to put a mind together, where it just didn’t care about any of the stuff that we cared about,” he explained.

The second moment arrived when OpenAI was founded. Having hoped Elon Musk’s involvement would lead to serious safety focus, Yudkowsky instead concluded the approach was to give everybody their own demon, which did not address the problem.

“And yeah, that was the day I realised that humanity probably wasn’t going to survive this,” he said.

Current race dynamics between AI companies and nations make the situation worse, according to Yudkowsky. Despite a May 2023 statement signed by leading AI figures including Sam Altman and Geoffrey Hinton declaring that mitigating AI extinction risk should be a global priority alongside pandemics and nuclear war, the signatories continued racing ahead with new model releases.

“Yeah. They’re not investing $500 billion in data centres in order to sell you $20 a month subscriptions. They’re doing it to sell employers $2,000 a month subscriptions,” Yudkowsky noted, highlighting how business incentives drive development of increasingly goal-oriented systems.

When asked what policies could improve outcomes if superintelligence arrives in 15 years, Yudkowsky proposed building an off switch. “Track all the G.P.U.s, or all the A.I.-related G.P.U.s, or all the systems of more than one G.P.U. You can maybe get away with letting people have G.P.U.s for their home video game systems, but the A.I.-specialized ones — put them all in a limited number of data centers under international supervision and try to have the A.I.s being only trained on the tracked G.P.U.s, have them only being run on the tracked G.P.U.s. And then, if you are lucky enough to get a warning shot, there is then the mechanism already in place for humanity to back the heck off,” he said.

Yudkowsky has influenced many people now working in AI safety, though he expressed frustration that those who truly understood his arguments were filtered out of working on AI capabilities. “I mean, I think they don’t grasp the theory,” he said of researchers who continue development despite safety concerns. “I think a lot of them, what’s really going on there is that they share your sense of normal outcomes as being the big, central thing you expect to see happen.”

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

Super-intelligent AI could ‘play dumb’ to trick evaluators and evade controls

The dream of an AI-integrated society could turn into a nightmare if…

Satellite dataset uses deep learning to map 9.2 million kilometres of roads

Researchers have combined deep-learning models with high-resolution satellite imagery to classify 9.2…

Universities quietly deploying GenAI to ‘game’ £2bn research funding system

UK universities are widely using generative AI to prepare submissions for the…

AI guardrails defeated by poetry as ‘smarter’ models prove most gullible

The world’s most advanced artificial intelligence systems are being easily manipulated into…

Researchers hijack X feed with ad blocker tech to cool political tempers

Scientists have successfully intercepted and reshaped live social media feeds using ad-blocker-style…

Doing good buys forgiveness as CSR becomes ‘insurance’ against layoffs

Companies planning to slash jobs or freeze pay should start saving the…