Harvard Medical School researchers have developed an artificial intelligence system that goes head-to-head with expert clinicians in diagnosing complex medical cases, with The New England Journal of Medicine publishing an AI-generated diagnosis for the first time.
Dr. CaBot, named after Mass General pathologist Richard Cabot who formalised clinical case studies in 1900, explains its reasoning step-by-step as it works through challenging medical cases before reaching a diagnosis, reports Harvard Medicine News. The system distinguishes itself from other AI diagnostic tools by spelling out its thought process rather than focusing solely on accuracy.
Arjun (Raj) Manrai, assistant professor of biomedical informatics in the Blavatnik Institute at HMS, and Thomas Buckley, a Harvard Kenneth C. Griffin School of Arts and Sciences doctoral student and a member of the Manrai lab, created the system. It appeared in NEJM on 8 October alongside a diagnosis from Gurpreet Dhaliwal of San Francisco Veterans Affairs Medical Center. The publication invited the AI to analyse one of its famed Case Records of the Massachusetts General Hospital, known for extremely challenging cases filled with distractions and red herrings.
The AI reached a comparable final diagnosis to Dhaliwal despite reasoning through the case differently. Dr. CaBot delivers its analysis in two formats: a roughly 5-minute narrated video presentation complete with filler words like “um” and “uh”, and a detailed written version.
Built on OpenAI’s o3 large language reasoning model, the system can efficiently search millions of clinical abstracts from high-impact journals and draw on several thousand existing clinical case studies. Manrai said: “We wanted to create an AI system that could generate a differential diagnosis and explain its detailed, nuanced reasoning at the level of an expert diagnostician.”
The researchers are demonstrating Dr. CaBot at Boston-area hospitals and have made it available online for users to test on new cases. The system requires further improvement, validation and patient privacy protections before clinical implementation, though physicians are already expressing interest in the tool’s potential to rapidly search vast quantities of medical literature.