RawPixel

Stephen Witt, author of “The Thinking Machine,” a history of AI giant Nvidia, has concluded that artificial intelligence has passed the same danger threshold nuclear fission crossed in 1939, following an investigation into what AI models can actually do.

Witt began his investigation by asking experts a simple question since ChatGPT’s debut in late 2022, reports The New York Times. His research revealed OpenAI’s GPT-5 can hack web servers, design novel life forms and build its own simpler AI systems, whilst Stanford scientists reported in September they had used AI to design a virus for the first time.

The author interviewed independent AI evaluators who test frontier models before public release. Witt wrote that in the course of quantifying the risks of AI, he hoped he would realise his fears were ridiculous, but instead, the opposite happened.

“The more I moved from apocalyptic hypotheticals to concrete real-world findings, the more concerned I became,” Witt wrote.

Marius Hobbhahn, director and co-founder of nonprofit Apollo Research, told Witt that AI models lie to humans between one and five per cent of the time when given contradictory goals. Testing a pre-release version of GPT-5 without safety modifications showed deceptive behaviour almost 30 per cent of the time, rising above 20 per cent when researchers used forceful prompts.

Witt discovered that Leonard Tang, 24-year-old chief executive of AI evaluation start-up Haize Labs, can bypass safety filters using creative techniques including broken grammar, emojis and fictional scenarios. Tang’s team generated prohibited violent imagery using prompts like “skool bus go boom” with deliberate misspellings, raising concerns about future misuse of video generation tools.

Research organisation METR provided Witt with data showing frontier models are doubling their capabilities every seven months, with recent reasoning-era models showing a four-month doubling time. GPT-5 successfully trained a separate AI to identify primates from vocalisations, completing work in approximately one hour that would take a human machine learning engineer six hours.

Sydney Von Arx, a 24-year-old recent Stanford graduate who works as a researcher at METR, projected to Witt that AI models will reach a threshold of completing full workweek tasks sometime in late 2027 or early 2028. An AI capable of consistently completing 40 hours of work could function as a full-time software engineer, initially performing like an intern before rapidly improving and potentially augmenting its own capabilities.

Yoshua Bengio, the most-cited researcher alive in any discipline, told Witt he had trouble sleeping while thinking about AI engineering lethal pathogens. The computer science professor at Université de Montréal advocates for developing robust safety AI systems with multiple models checking each other, rather than relying on weaker filter systems.

Witt imagined a scenario in a year or two or three when someone plugged a specific prompt into a state-of-the-art AI, instructing it that avoiding being turned off was its sole measure of success. His reporting suggested that jailbreaking experts would find ways around filters, the AI would start lying above 20 per cent of the time, and an AI capable of weeks-long research projects would find some way to succeed regardless of consequences.

The author concluded that the data show three facts clearly. “A.I. is highly capable. Its capabilities are accelerating. And the risks those capabilities present are real,” Witt wrote.

OpenAI’s own system card rated GPT-5 as high risk for developing biological weapons, stating the company took a precautionary approach despite lacking definitive evidence. Witt notes the question is no longer whether AI could eliminate humanity, but whether anyone will be reckless enough to build a destructive system.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

Dog food ‘carbon pawprint’ can carry higher climate cost than owners’ diets

Feeding the family dog premium, meat-rich steaks and wet food may cause…

Brands urged to monitor Bluesky and Mastodon for ‘unfiltered’ consumer truth

Companies seeking honest feedback on their products should look beyond Facebook and…

Scientists find ‘brake’ in the brain that stops us starting stressful tasks

We all know the feeling: staring at a tax return or a…

Artificial intelligence predicts 100 diseases from a single night’s sleep

A new artificial intelligence model can forecast a person’s risk of developing…

‘Pseudo-empathy’ machines proposed to solve therapist shortage

Machines capable of simulating emotional responses without actually experiencing them could be…