AI doctors.
Photo credit: theFreesheet/Google ImageFX

Researchers have created a fully autonomous AI system that acts like a panel of five doctors to detect “whispers” of cognitive decline that human clinicians might miss.

A team at Mass General Brigham has developed one of the first autonomous artificial intelligence systems capable of screening for cognitive impairment by reading routine clinical documentation.

The system, detailed in a study published in npj Digital Medicine, operates without human intervention and achieved 98 per cent specificity in real-world validation testing.

Unlike standard AI models, this system functions as a “digital clinical team”. It employs five specialised AI agents that work collaboratively, critiquing each other’s reasoning and refining their clinical determinations just like human clinicians would in a case conference.

Catching the ‘whispers’

Cognitive impairment is often underdiagnosed in routine care, meaning many patients miss the critical window for early treatment of conditions like Alzheimer’s disease.

To address this, the AI was designed to analyse clinical notes generated during regular healthcare visits, turning everyday paperwork into a valuable screening tool.

“Clinical notes contain whispers of cognitive decline that busy clinicians can’t systematically surface,” said co-lead study author Dr Lidia Moura, director of Population Health and the Center for Healthcare Intelligence at Mass General Brigham. “This system listens at scale”.

In a study analysing over 3,300 clinical notes from 200 patients, the system proved surprisingly astute.

When the AI and human reviewers disagreed on a case, an independent expert re-evaluated the data. The expert found that the AI’s reasoning was valid 58 per cent of the time, suggesting the software was identifying sound clinical judgements that the initial human review had overlooked.

“We expected to find AI errors. Instead, we often found the AI was making defensible judgments based on the evidence in the notes,” said corresponding author Hossein Estiri, PhD.

Open source and private

Along with the findings, the team is releasing Pythia, an open-source tool that enables other healthcare systems to deploy similar “agentic” AI frameworks.

Crucially, the system runs on an open-weight large language model that can be deployed locally within a hospital’s IT infrastructure, ensuring no patient data is transmitted to external servers or cloud-based providers.

The researchers were transparent about where the AI struggled. While it excelled at reading comprehensive narratives, it struggled to interpret isolated data lacking context.

Additionally, while the system achieved 91 per cent sensitivity in balanced testing, that figure dropped to 62 per cent in real-world conditions where case prevalence was lower. However, the specificity — its ability to correctly identify healthy patients — remained high at 98 per cent.

“We’re publishing exactly the areas in which AI struggles,” said Estiri. “The field needs to stop hiding these calibration challenges if we want clinical AI to be trusted.”

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

Dog food ‘carbon pawprint’ can carry higher climate cost than owners’ diets

Feeding the family dog premium, meat-rich steaks and wet food may cause…

Artificial intelligence predicts 100 diseases from a single night’s sleep

A new artificial intelligence model can forecast a person’s risk of developing…

Brands urged to monitor Bluesky and Mastodon for ‘unfiltered’ consumer truth

Companies seeking honest feedback on their products should look beyond Facebook and…

Scientists find ‘brake’ in the brain that stops us starting stressful tasks

We all know the feeling: staring at a tax return or a…