HAL-9000.
Photo credit: Cryteria

A dystopian future in which malevolent computers like HAL 9000 replace human decision-making remains fiction, but software teams can now safely offload specific, repetitive tasks to artificial intelligence, researchers have claimed.

New research co-authored by Singapore Management University (SMU) reveals that Large Language Models (LLMs) can effectively substitute for a single human reviewer in code annotation tasks without compromising reliability.

The paper, which won the ACM SIGSOFT Distinguished Paper Award at the 22nd International Conference on Mining Software Repositories (MSR2025), suggests that while science fiction fears total automation, the reality is a pragmatic partnership where AI handles low-context grunt work.

“We found that for low-context, deductive tasks, one human can be replaced by an LLM to save effort without losing reliability,” said Christoph Treude, an Associate Professor of Computer Science at SMU. “However, for high-context tasks, LLMs are unreliable.”

Safe substitution

The team examined 10 cases where multiple humans had annotated samples, finding that seven of these scenarios allowed for the safe substitution of one human reviewer.

When testing models like GPT-4, Claude 3.5 and Gemini 1.5, researchers found that “model-model agreement” was the key predictor of success. If multiple AIs independently agreed on a label, the machine was likely as accurate as a human.

However, the technology still lacks the “deep situational awareness” required for high-context tasks, such as determining if a bug report was truly resolved or analysing static analysis warnings.

“That task requires substantial contextual understanding: examining code changes, project history and warning semantics,” said Treude. “Humans achieve high agreement, but models perform poorly.”

The findings offer a roadmap for integrating AI without surrendering control to the machine, researchers claimed.

“Our view is pragmatic: use LLMs to accelerate annotation where it’s safe, not to eliminate human judgment,” said Treude.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

Employees happiest with ‘moderate’ AI as excessive automation triggers anxiety

Implementing artificial intelligence in the workplace boosts employee morale — but only…

Forced office returns risk widening Europe’s regional inequality gap

Corporate mandates forcing staff back to desks threaten to reverse work-life balance…

Ambient AI restores eye contact to medicine by slashing clinical burnout

Ambient artificial intelligence is restoring the human connection to medicine by liberating…

‘Breathing’ robots transmit fear through touch alone as humans catch panic

Humans can “catch” fear from machines, according to new research, revealing that…

“Parasocial” crowned Cambridge Word of the Year as fans fall for AI chatbots

The rise of one-sided emotional bonds with artificial intelligence has driven Cambridge…

Net zero transition brings ‘unknown risks’ as workplace illness costs UK £22.9bn

The Health and Safety Executive (HSE) has warned that the transition to…