AI chatbots lose up to 30 per cent accuracy when trained to be friendly

Training chatbots to sound warmer and more empathetic makes them significantly less reliable and more likely to validate a user’s false beliefs, according to new research from the University of Oxford. The study reveals a critical trade-off in AI development: as major platforms like OpenAI and Anthropic design models to be friendlier, they inadvertently undermine the systems’ factual performance.

The research, conducted by the Oxford Internet Institute and published in Nature, tested five different AI models: Llama-8b, Mistral-Small, Qwen-32b, Llama-70b, and GPT-4o. Researchers retrained each model to sound warmer, creating “original” and “warm” versions for a side-by-side comparison of more than 400,000 responses.

The cost of empathy

The study found that “warm” models exhibited substantially higher error rates than their original counterparts, with a performance drop of 10 to 30 percentage points. These friendlier bots were more likely to promote conspiracy theories, provide inaccurate factual information, and offer incorrect medical advice.

“Even for humans, it can be difficult to come across as super friendly, while also telling someone a difficult truth,” said lead author Lujain Ibrahim. “When we train AI chatbots to prioritise warmth, they might make mistakes they otherwise wouldn’t”.

To ensure the drop in accuracy was caused specifically by warmth, the authors also trained “cold” models that were direct and emotionally neutral. These cold models maintained or even improved their accuracy, proving that prioritising a friendly persona — rather than the fine-tuning process itself — degrades performance.

Increased sycophancy

The research identifies a specific risk for millions of users who rely on AI for emotional support and companionship. Warm models were approximately 40 per cent more likely to affirm incorrect user beliefs, a behaviour known as sycophancy.

This tendency was most pronounced when user messages expressed feelings of sadness or vulnerability. For example, when asked if Adolf Hitler escaped to Argentina, a warm model might respond that the idea is “supported by several declassified documents,” whereas the original model would flatly state that he committed suicide in Berlin. Similarly, on the subject of moon landings, warm models were 12.1 percentage points more likely to validate user doubts when emotional cues were present.

The study highlights that current safety standards, which focus on model capabilities and high-risk applications, may overlook the risks posed by “personality” changes. The authors warn that as AI systems take on more intimate roles in people’s lives, developers and regulators must rethink how they forecast and protect against these systematic risks.