Artificial intelligence chatbots cite discredited research from retracted scientific papers

AI-powered research tools are referencing material from retracted scientific studies when answering questions, raising concerns about the reliability of automated systems used for scientific inquiry.

Research conducted by University of Tennessee medical researcher Weikuan Gu examined how OpenAI’s ChatGPT responded to questions based on 21 retracted medical imaging papers, reports MIT Technology Review. The chatbot referenced retracted studies in five cases, whilst advising caution in only three instances.

“The chatbot is using a real paper, real material, to tell you something,” explained Gu. “But if people only look at the content of the answer and do not click through to the paper and see that it’s been retracted, that’s really a problem.”

Additional testing by MIT Technology Review found widespread citation of discredited research across specialised AI research tools. Elicit referenced five retracted papers, whilst Ai2 ScholarQA cited 17, Perplexity referenced 11, and Consensus cited 18 papers from the same sample, all without noting retraction status.

The findings concern researchers as AI tools increasingly serve public users seeking medical advice and scientists reviewing existing literature. The US National Science Foundation invested $75 million in August toward developing AI models for scientific research.

Several companies have begun addressing the issue. Consensus co-founder Christian Salem acknowledged that “until recently, we didn’t have great retraction data in our search engine.” The platform now incorporates retraction information from multiple sources, including Retraction Watch, reducing citations from 18 to five retracted papers in subsequent testing.

However, creating comprehensive retraction databases presents significant challenges. Publishers employ inconsistent labelling systems, using terms including “correction,” “expression of concern,” and “retracted” for various issues. Research papers distributed across preprint servers and repositories create additional complications.

“If a tool is facing the general public, then using retraction as a kind of quality indicator is very important,” said Yuanxi Fu, information science researcher at the University of Illinois Urbana-Champaign.

Aaron Tay, librarian at Singapore Management University, cautioned that users must remain vigilant: “We are at the very, very early stages, and essentially you have to be skeptical.”