HAL isn’t taking over yet but AI can safely replace humans in code reviews

2 minute read

HAL-9000. — Photo credit: Cryteria

Up next

poly(dicyclopentadiene) plastic

Nature-inspired plastics self-destruct at programmable speeds to fight waste

A dystopian future in which malevolent computers like HAL 9000 replace human decision-making remains fiction, but software teams can now safely offload specific, repetitive tasks to artificial intelligence, researchers have claimed.

New research co-authored by Singapore Management University (SMU) reveals that Large Language Models (LLMs) can effectively substitute for a single human reviewer in code annotation tasks without compromising reliability.

The paper, which won the ACM SIGSOFT Distinguished Paper Award at the 22nd International Conference on Mining Software Repositories (MSR2025), suggests that while science fiction fears total automation, the reality is a pragmatic partnership where AI handles low-context grunt work.

“We found that for low-context, deductive tasks, one human can be replaced by an LLM to save effort without losing reliability,” said Christoph Treude, an Associate Professor of Computer Science at SMU. “However, for high-context tasks, LLMs are unreliable.”

Safe substitution

The team examined 10 cases where multiple humans had annotated samples, finding that seven of these scenarios allowed for the safe substitution of one human reviewer.

When testing models like GPT-4, Claude 3.5 and Gemini 1.5, researchers found that “model-model agreement” was the key predictor of success. If multiple AIs independently agreed on a label, the machine was likely as accurate as a human.

However, the technology still lacks the “deep situational awareness” required for high-context tasks, such as determining if a bug report was truly resolved or analysing static analysis warnings.

“That task requires substantial contextual understanding: examining code changes, project history and warning semantics,” said Treude. “Humans achieve high agreement, but models perform poorly.”

The findings offer a roadmap for integrating AI without surrendering control to the machine, researchers claimed.

“Our view is pragmatic: use LLMs to accelerate annotation where it’s safe, not to eliminate human judgment,” said Treude.

Leave a Reply Cancel reply

You May Also Like

Healthcare.

AI could revolutionise global healthcare — if we stop leaving billions behind

Artificial intelligence offers a historic opportunity to fix broken medical systems in…

Jayasree K. Iyer
February 19, 2026

Data centre.

To govern AI, we must stop policing software and start capping ‘compute’

Trying to regulate subjective AI capabilities is a losing battle. Instead, we…

Joel Christoph
February 19, 2026

Ramon Novarro and Helen Hayes in The Son-Daughter.

Why the AI job apocalypse might just be history repeating itself

From silent film stars to bank tellers, professions threatened by new technology…

John Letzing
February 20, 2026

Hangover.

Highly creative days leave professional artists with an emotional hangover

We are often told that tapping into our creative side is the…

George Hopkin
February 24, 2026

Local government and AIs.

Why failing public sector AI projects refuse to die despite broken promises

Generative AI projects in public administration often persist even when the technology…

George Hopkin
February 19, 2026

Doomscrolling.

Bedtime doomscrolling costing millions of Americans a good night’s sleep

Millions of Americans are actively sacrificing a good night’s rest for one…

George Hopkin
February 24, 2026

Data centres.

40 million lost days: The real ‘human cost’ of the race for digital capacity

As data centres scale to power the AI era, it’s not just…

Shane Moore
March 4, 2026

Sleepy teenager.

Social media and video games aren’t entirely to blame for teen sleep loss

Parents and policymakers often point the finger directly at TikTok, late-night texting,…

George Hopkin
March 3, 2026

DopFone

New smartphone app tracks fetal heart rate with clinical accuracy

Expectant mothers may soon be able to monitor their baby’s heartbeat from…

George Hopkin
March 3, 2026

Pathology Image

AI cancer tools are cheating by learning shortcuts instead of true biology

Artificial intelligence systems designed to diagnose cancer from tissue slides may be…

George Hopkin
March 3, 2026