Mental health crisis.
Photo credit: theFreesheet/Google ImageFX

A critical flaw in Large Language Models is threatening to “fundamentally compromise” scientific research, according to a new study that found AI models fabricate nearly one in five citations.

The research, published in JMIR Mental Health, found that when testing GPT-4o, 19.9 per cent of all citations in simulated mental health literature reviews were “bibliographic hallucinations” that could not be traced to any real publication.

The study also found that among the seemingly real citations, 45.4 per cent contained bibliographic errors, most commonly incorrect or invalid Digital Object Identifiers (DOIs).

In total, the researchers concluded that nearly two-thirds of all citations generated by the AI were either entirely fabricated or contained significant errors, prompting an urgent call for rigorous human verification.

The authors, including Dr Jake Linardon from Deakin University, warned that these errors “fundamentally compromise the integrity and trustworthiness of scientific results” by breaking the chain of verifiability and misleading readers.

Major depressive disorder

The study systematically tested the reliability of the AI’s output across mental health topics with varying levels of public awareness: major depressive disorder (high familiarity), binge eating disorder (moderate), and body dysmorphic disorder (low).

Researchers found the fabrication risk was significantly higher for less familiar topics. The fabrication rate was 28 per cent for binge eating disorder and 29 per cent for body dysmorphic disorder, compared to only 6 per cent for major depressive disorder.

Fabrication rates were also found to be higher for specialised review prompts, such as those focusing on digital interventions, compared to general overviews for certain disorders.

The study’s authors issued a strong warning that researchers and students must subject all LLM-generated references to “careful human verification to validate their accuracy and authenticity”. They also called on journal editors to implement stronger safeguards, such as detection software, and for academic institutions to develop clear policies and training to address the risk.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

Humans beat AI at spotting deepfake videos but fail entirely with photos

As artificial intelligence gets better at generating fake imagery, a new study…

40 million lost days: The real ‘human cost’ of the race for digital capacity

As data centres scale to power the AI era, it’s not just…

Grocery stores are new immigration ‘hot spots’ but communities fight back

As immigration enforcement reaches deep into everyday American life, once-safe business spaces…

AI cancer tools are cheating by learning shortcuts instead of true biology

Artificial intelligence systems designed to diagnose cancer from tissue slides may be…

Problematic video gaming is triggering psychotic-like symptoms in pre-teens

Parents have long worried about the effects of too much screen time.…

This invisible audio shield turns AI voice clones into distorted garbage

Imagine dropping a highly anticipated new single, only to watch artificial intelligence…