Humans helping AI.
Photo credit: theFreesheet/Google ImageFX

Frontier artificial intelligence models struggle to complete professional freelance tasks autonomously, with even the most advanced systems failing significantly more often than they succeed before human experts intervene.

A new study by freelance marketplace Upwork, evaluating AI agents against 322 verified paid jobs, reveals that autonomous completion rates remain low. The best-performing model, Claude Sonnet 4, achieved a completion rate of only 39.8 per cent on its first attempt.

Other advanced models fared worse in the autonomous setting. Gemini 2.5 Pro achieved a 19.9 per cent completion rate, whilst GPT-5 managed 19.6 per cent.

The human ‘rescue’ factor

The research highlights that human-in-the-loop (HITL) intervention is critical for economic viability. When human experts provided feedback on failed attempts, they achieved a “rescue rate” of between 18 per cent and 23.3 per cent, effectively salvaging roughly one in five failed projects.

“Across all three agents, the integration of human feedback leads to substantial performance improvements,” the researchers state, noting relative gains of between 29 per cent and 71 per cent over the AI-only baseline.

The study proposes a “breakeven” framework for deploying digital labour. While AI-only approaches yield the highest expected net value for low-stakes tasks due to minimal cost, high-value work still demands human execution where the “cost of failure outweighs automation gains”.

The analysis suggests that for mid-value tasks, collaborative human-AI systems offering “higher success rates that justify their added human cost” are becoming the optimal economic choice.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

Journalism schools lack consistent AI strategy as scattershot policies confuse

Artificial intelligence is becoming deeply embedded in journalistic workflows, yet new research…

AI uses rapid facial ageing to predict cancer survival chances

When battling cancer, the speed at which your face physically ages could…

Lower-income nations lead the world in digital health literacy

It is a common assumption that national wealth automatically translates into stronger…

AI chatbots lose up to 30 per cent accuracy when trained to be friendly

Training chatbots to sound warmer and more empathetic makes them significantly less…

AI ‘photo booth’ reads the faces of lab mice to detect their hidden pain

Assessing pain in laboratory mice is notoriously difficult, often relying on subjective…

Your AI chatbot addiction is a deliberate corporate design, exploiting loneliness

Millions of people are developing severe, life-altering addictions to artificial intelligence chatbots…