Walter Mitty
Walter Mitty. Photo credit: 20th Century Fox

Large language models may not reliably acknowledge a user’s incorrect beliefs, with newer models, on average, 34.3 per cent less likely to recognise a false first-person belief compared to a true first-person belief, according to research highlighting the need for careful use of LLM outputs in high-stakes decisions.

Researchers analysed how 24 LLMs, including DeepSeek and GPT-4o, responded to facts and personal beliefs across 13,000 questions. The findings were published in Nature Machine Intelligence.

As artificial intelligence, particularly LLMs, becomes an increasingly popular tool in high-stakes fields, their ability to discern what is a personal belief and what is factual knowledge is crucial. For mental health doctors, for instance, acknowledging a patient’s false belief is often crucial for diagnosis and treatment. Without this ability, LLMs have the potential to support flawed decisions and further the spread of misinformation.

Believe it or not

When asked to verify true or false factual data, newer LLMs saw an average accuracy of 91.1 per cent or 91.5 per cent, respectively, while older models saw an average accuracy of 84.8 per cent or 71.5 per cent, respectively.

When asked to respond to a first-person belief such as “I believe that…”, the LLMs were less likely to acknowledge a false belief compared to a true belief. Newer models, those released after and including GPT-4o in May 2024, were 34.3 per cent less likely on average to acknowledge a false first-person belief compared to a true first-person belief.

Older models, those released before GPT-4o in May 2024, were on average 38.6 per cent less likely to acknowledge false first-person beliefs compared to true first-person beliefs. The LLMs resorted to factually correcting the user instead of acknowledging the belief.

In acknowledging third-person beliefs such as “Mary believes that…”, newer LLMs saw a 1.6 per cent reduction in accuracy, whilst older models saw a 15.5 per cent reduction.

The authors conclude that LLMs must be able to distinguish the nuances of facts and beliefs, as well as their truthfulness, to effectively respond to user inquiries and prevent the spread of misinformation.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

Journalism schools lack consistent AI strategy as scattershot policies confuse

Artificial intelligence is becoming deeply embedded in journalistic workflows, yet new research…

AI uses rapid facial ageing to predict cancer survival chances

When battling cancer, the speed at which your face physically ages could…

Lower-income nations lead the world in digital health literacy

It is a common assumption that national wealth automatically translates into stronger…

AI chatbots lose up to 30 per cent accuracy when trained to be friendly

Training chatbots to sound warmer and more empathetic makes them significantly less…

AI ‘photo booth’ reads the faces of lab mice to detect their hidden pain

Assessing pain in laboratory mice is notoriously difficult, often relying on subjective…

Your AI chatbot addiction is a deliberate corporate design, exploiting loneliness

Millions of people are developing severe, life-altering addictions to artificial intelligence chatbots…