Medical AI.
Photo credit: theFreesheet/Google ImageFX

Despite the massive hype surrounding artificial intelligence in healthcare, a vast gap remains between how models perform on tests and how they function in real hospitals. A new paper from Harvard Medical School argues that “contextual errors” are to blame.

In a study published in Nature Medicine, researchers argue that models often provide technically correct answers that are practically useless because they fail to account for the specific situation.

“This is not a minor fluke,” says Marinka Zitnik, associate professor of biomedical informatics at Harvard. “It is a broad limitation of all the types of medical AI models that we are developing in the field.”

The core issue is that the datasets used to train these systems often lack crucial nuances — such as local treatment availability, a patient’s financial situation, or specific hospital protocols.

“The models then generate recommendations that seem reasonable and sensible but are not actually relevant or actionable for patients,” Zitnik explains.

The geography trap

One major blind spot is location. A treatment plan suitable for a top-tier hospital in the US might be impossible to implement elsewhere.

“If a model is presented with the same question in different geographic locations and gives the same answer, that answer is likely to be incorrect because each place will have specific conditions and constraints,” says Zitnik. “Whether that patient is in South Africa, the United States, or Sweden may make a big difference.”

The AI also struggles with the human realities of patients’ lives. Zitnik cites the example of a patient who misses an oncology appointment. A standard AI might simply flag them to reschedule, ignoring the root cause.

“This overlooks potential barriers such as the patient living far from the oncologist, not having reliable childcare, or not being able to miss work,” Zitnik notes. “These types of constraints do not explicitly exist in the patient’s electronic health record, which means they also would not be factored in by an AI model.”

Specialised silos

Furthermore, medical AI is often trained within specific specialities, leading to tunnel vision. If a patient presents with complex symptoms, an AI trained on one organ system might miss the bigger picture.

“An AI model trained mostly on one specialty… might provide answers based on data from the wrong specialty or miss that the combination of symptoms points to a multisystem disease,” says Zitnik.

To fix these issues, the researchers call for new benchmarks and datasets that include these “contextual” layers. They also emphasise the need for transparency to build trust among doctors.

“We think the answer has to do with building models that provide transparent and easily interpretable recommendations and that say ‘I don’t know’ when they are not confident in their conclusions,” says Zitnik.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

Resilience by design: Protecting the North’s digital backbone

theFreesheet is the official media partner for Manchester Edge & Digital Infrastructure…

DeepMind and Anthropic CEOs clash on AGI timeline but agree on disruption

The leaders of two of the world’s most powerful AI companies offered…

Funny business: Algorithms reveal hidden engineering of stand-up comedy

It may feel like a spontaneous conversation, but a new algorithmic analysis…

95% of AI pilots failing as companies driven by ‘fear of missing out’, Davos told

Ninety-five per cent of generative AI pilot projects are failing to deliver…

‘Digital harness’ needed to tame AI before it surpasses human intelligence

A “digital harness” is urgently needed to prevent artificial intelligence from outrunning…