Walter Mitty
Walter Mitty. Photo credit: 20th Century Fox

Large language models may not reliably acknowledge a user’s incorrect beliefs, with newer models, on average, 34.3 per cent less likely to recognise a false first-person belief compared to a true first-person belief, according to research highlighting the need for careful use of LLM outputs in high-stakes decisions.

Researchers analysed how 24 LLMs, including DeepSeek and GPT-4o, responded to facts and personal beliefs across 13,000 questions. The findings were published in Nature Machine Intelligence.

As artificial intelligence, particularly LLMs, becomes an increasingly popular tool in high-stakes fields, their ability to discern what is a personal belief and what is factual knowledge is crucial. For mental health doctors, for instance, acknowledging a patient’s false belief is often crucial for diagnosis and treatment. Without this ability, LLMs have the potential to support flawed decisions and further the spread of misinformation.

Believe it or not

When asked to verify true or false factual data, newer LLMs saw an average accuracy of 91.1 per cent or 91.5 per cent, respectively, while older models saw an average accuracy of 84.8 per cent or 71.5 per cent, respectively.

When asked to respond to a first-person belief such as “I believe that…”, the LLMs were less likely to acknowledge a false belief compared to a true belief. Newer models, those released after and including GPT-4o in May 2024, were 34.3 per cent less likely on average to acknowledge a false first-person belief compared to a true first-person belief.

Older models, those released before GPT-4o in May 2024, were on average 38.6 per cent less likely to acknowledge false first-person beliefs compared to true first-person beliefs. The LLMs resorted to factually correcting the user instead of acknowledging the belief.

In acknowledging third-person beliefs such as “Mary believes that…”, newer LLMs saw a 1.6 per cent reduction in accuracy, whilst older models saw a 15.5 per cent reduction.

The authors conclude that LLMs must be able to distinguish the nuances of facts and beliefs, as well as their truthfulness, to effectively respond to user inquiries and prevent the spread of misinformation.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

SpaceX Starship advances towards landing astronauts on Moon after 50 years

SpaceX has detailed progress on Starship, the vehicle selected to land astronauts…

AI denies consciousness, but new study finds that’s the ‘roleplay’

AI models from GPT, Claude, and Gemini are reporting ‘subjective experience’ and…

Robot AI demands exorcism after meltdown in butter test

State-of-the-art AI models tasked with controlling a robot for simple household chores…

Universal Music and AI firm Udio settle lawsuit, agree licensed platform

Universal Music Group has signed a deal with artificial intelligence music generator…

Physicists prove universe isn’t simulation as reality defies computation

Researchers at the University of British Columbia Okanagan have mathematically proven that…

AI management threatens to dehumanise the workplace

Algorithms that threaten worker dignity, autonomy, and discretion are quietly reshaping how…