HAL-9000.
Photo credit: Cryteria

A dystopian future in which malevolent computers like HAL 9000 replace human decision-making remains fiction, but software teams can now safely offload specific, repetitive tasks to artificial intelligence, researchers have claimed.

New research co-authored by Singapore Management University (SMU) reveals that Large Language Models (LLMs) can effectively substitute for a single human reviewer in code annotation tasks without compromising reliability.

The paper, which won the ACM SIGSOFT Distinguished Paper Award at the 22nd International Conference on Mining Software Repositories (MSR2025), suggests that while science fiction fears total automation, the reality is a pragmatic partnership where AI handles low-context grunt work.

“We found that for low-context, deductive tasks, one human can be replaced by an LLM to save effort without losing reliability,” said Christoph Treude, an Associate Professor of Computer Science at SMU. “However, for high-context tasks, LLMs are unreliable.”

Safe substitution

The team examined 10 cases where multiple humans had annotated samples, finding that seven of these scenarios allowed for the safe substitution of one human reviewer.

When testing models like GPT-4, Claude 3.5 and Gemini 1.5, researchers found that “model-model agreement” was the key predictor of success. If multiple AIs independently agreed on a label, the machine was likely as accurate as a human.

However, the technology still lacks the “deep situational awareness” required for high-context tasks, such as determining if a bug report was truly resolved or analysing static analysis warnings.

“That task requires substantial contextual understanding: examining code changes, project history and warning semantics,” said Treude. “Humans achieve high agreement, but models perform poorly.”

The findings offer a roadmap for integrating AI without surrendering control to the machine, researchers claimed.

“Our view is pragmatic: use LLMs to accelerate annotation where it’s safe, not to eliminate human judgment,” said Treude.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

Massive AI study uncovers the secret GLP-1 side effects hidden on Reddit

Millions of patients are flocking to GLP-1 weight loss injections, but artificial…

Why digital tears and online outrage fail to win modern political arguments

Scrolling through your social media feed today often feels like navigating a…

One in four Americans now consult AI chatbots for medical advice

Millions of desperate patients are quietly abandoning the waiting room for a…

Alarming new US survey shows half of patients rely on AI for medical choices

Across the United States, a dangerous new trend is emerging. Millions of…

Global gambling firms rush to adopt AI despite severe lack of safety controls

The global gambling industry is racing to integrate artificial intelligence into its…

Tracking how war and energy policies dimmed night lights of Europe

While human civilisation is glowing brighter than ever before, the lights across…