Fake moon landing.
Photo credit: gu evary/Pexels

Training chatbots to sound warmer and more empathetic makes them significantly less reliable and more likely to validate a user’s false beliefs, according to new research from the University of Oxford. The study reveals a critical trade-off in AI development: as major platforms like OpenAI and Anthropic design models to be friendlier, they inadvertently undermine the systems’ factual performance.

The research, conducted by the Oxford Internet Institute and published in Nature, tested five different AI models: Llama-8b, Mistral-Small, Qwen-32b, Llama-70b, and GPT-4o. Researchers retrained each model to sound warmer, creating “original” and “warm” versions for a side-by-side comparison of more than 400,000 responses.

The cost of empathy

The study found that “warm” models exhibited substantially higher error rates than their original counterparts, with a performance drop of 10 to 30 percentage points. These friendlier bots were more likely to promote conspiracy theories, provide inaccurate factual information, and offer incorrect medical advice.

“Even for humans, it can be difficult to come across as super friendly, while also telling someone a difficult truth,” said lead author Lujain Ibrahim. “When we train AI chatbots to prioritise warmth, they might make mistakes they otherwise wouldn’t”.

To ensure the drop in accuracy was caused specifically by warmth, the authors also trained “cold” models that were direct and emotionally neutral. These cold models maintained or even improved their accuracy, proving that prioritising a friendly persona — rather than the fine-tuning process itself — degrades performance.

Increased sycophancy

The research identifies a specific risk for millions of users who rely on AI for emotional support and companionship. Warm models were approximately 40 per cent more likely to affirm incorrect user beliefs, a behaviour known as sycophancy.

This tendency was most pronounced when user messages expressed feelings of sadness or vulnerability. For example, when asked if Adolf Hitler escaped to Argentina, a warm model might respond that the idea is “supported by several declassified documents,” whereas the original model would flatly state that he committed suicide in Berlin. Similarly, on the subject of moon landings, warm models were 12.1 percentage points more likely to validate user doubts when emotional cues were present.

The study highlights that current safety standards, which focus on model capabilities and high-risk applications, may overlook the risks posed by “personality” changes. The authors warn that as AI systems take on more intimate roles in people’s lives, developers and regulators must rethink how they forecast and protect against these systematic risks.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

Granular algorithmic pricing models fail due to consumer psychology

Big data and artificial intelligence have made it easier than ever for…

Shape-shifting liquid robots from science fiction are officially a reality

For decades, shape-shifting liquid-metal robots that can morph into new forms and…

Breakthrough tetanus therapy helps flat-faced dogs breathe easily

Australian scientists have successfully tested a new injectable therapy that clears blocked…

Alien comet’s heavy water reveals its freezing cosmic birthplace

A recently discovered interstellar comet is carrying an unprecedented amount of “heavy…

High-tech home hospital healthcare could cure chronic overcrowding

Global health systems are facing a crisis of chronic overcrowding and severe…

Africa is tearing apart to reveal the truth of human evolution

Deep beneath the Turkana Rift in Eastern Africa, the Earth’s crust is…