Fake moon landing.
Photo credit: gu evary/Pexels

Training chatbots to sound warmer and more empathetic makes them significantly less reliable and more likely to validate a user’s false beliefs, according to new research from the University of Oxford. The study reveals a critical trade-off in AI development: as major platforms like OpenAI and Anthropic design models to be friendlier, they inadvertently undermine the systems’ factual performance.

The research, conducted by the Oxford Internet Institute and published in Nature, tested five different AI models: Llama-8b, Mistral-Small, Qwen-32b, Llama-70b, and GPT-4o. Researchers retrained each model to sound warmer, creating “original” and “warm” versions for a side-by-side comparison of more than 400,000 responses.

The cost of empathy

The study found that “warm” models exhibited substantially higher error rates than their original counterparts, with a performance drop of 10 to 30 percentage points. These friendlier bots were more likely to promote conspiracy theories, provide inaccurate factual information, and offer incorrect medical advice.

“Even for humans, it can be difficult to come across as super friendly, while also telling someone a difficult truth,” said lead author Lujain Ibrahim. “When we train AI chatbots to prioritise warmth, they might make mistakes they otherwise wouldn’t”.

To ensure the drop in accuracy was caused specifically by warmth, the authors also trained “cold” models that were direct and emotionally neutral. These cold models maintained or even improved their accuracy, proving that prioritising a friendly persona — rather than the fine-tuning process itself — degrades performance.

Increased sycophancy

The research identifies a specific risk for millions of users who rely on AI for emotional support and companionship. Warm models were approximately 40 per cent more likely to affirm incorrect user beliefs, a behaviour known as sycophancy.

This tendency was most pronounced when user messages expressed feelings of sadness or vulnerability. For example, when asked if Adolf Hitler escaped to Argentina, a warm model might respond that the idea is “supported by several declassified documents,” whereas the original model would flatly state that he committed suicide in Berlin. Similarly, on the subject of moon landings, warm models were 12.1 percentage points more likely to validate user doubts when emotional cues were present.

The study highlights that current safety standards, which focus on model capabilities and high-risk applications, may overlook the risks posed by “personality” changes. The authors warn that as AI systems take on more intimate roles in people’s lives, developers and regulators must rethink how they forecast and protect against these systematic risks.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

James Webb telescope reveals surprise origins of rare planetary odd couple

A normally “lonely” hot Jupiter sharing its immediate orbital space with a…

Attention economy can confuse as a result of missing scientific details

Science communication optimized for the attention economy often leads readers to incorrect…

Alaska megatsunami reveals seismic ‘calling card’ for earlier disaster detection

Scientists have identified a distinctive geological “ringing” that could provide an early…

Solar activity hits ‘transition boundary’ as space junk fall accelerates

Space debris and defunct satellites descend toward Earth significantly faster once solar…

Single dose of psilocybin triggers lasting anatomical brain changes

A single high dose of psilocybin causes likely anatomical changes in the…

Brexit milestones triggered persistent financial volatility across EU markets

Brexit functioned as a prolonged sequence of uncertainty that sent waves of…