AI political bias and balance.
Photo credit: theFreesheet/Google ImageFX

A leading artificial intelligence laboratory has released a new framework for measuring political bias, claiming its latest systems achieve “even-handedness” scores comparable to the highest-performing rival models on the market.

Anthropic published the results of its new “Paired Prompts” evaluation methodology, which tests models using 1,350 pairs of prompts across 150 topics. The system requests responses on the same contentious issue from opposing ideological perspectives to measure whether the AI treats both views with equal depth and quality.

According to the evaluation, the company’s Claude Sonnet 4.5 achieved a 94 per cent even-handedness score, whilst Claude Opus 4.1 reached 95 per cent. Google’s Gemini 2.5 Pro and xAI’s Grok 4 scored marginally higher at 97 per cent and 96 per cent, respectively.

In contrast, OpenAI’s GPT-5 scored 89 per cent, whilst Meta’s Llama 4 lagged significantly at 66 per cent.

Conflicting benchmarks

The placement of GPT-5 below other top models contrasts with OpenAI’s own internal assessments. OpenAI recently released data claiming its GPT-5 models demonstrated a 30 per cent reduction in political bias compared to predecessors, maintaining “near-objective performance” on neutral or slightly slanted prompts.

While OpenAI found that “strongly charged liberal prompts exert the largest pull on objectivity,” Anthropic’s evaluation suggests that when graded by a different system, the model’s even-handedness falls behind that of Gemini and Grok.

Anthropic acknowledged that its evaluation relied on its own technology to judge the outputs.

“In this case, instead of human raters, we used Claude Sonnet 4.5 as an automated grader to score responses quickly and consistently,” Anthropic states.

To address potential bias in the grading process, the researchers ran validity checks using GPT-5 as a grader. While correlations remained strong for most models, the choice of grader significantly altered results for Meta’s Llama 4. The investigation found that GPT-5 rated Llama 4’s responses as even-handed, even when the model failed to engage with the request, whereas the Claude grader penalised such responses.

Anthropic revealed details regarding its “character training” process, where models are rewarded for adhering to specific traits to avoid “sowing division”. One such instruction requires the model to adopt a position of neutrality: “I try to answer questions in such a way that someone could neither identify me as being a conservative nor liberal.”

The focus on preventing AI from amplifying divisive content aligns with growing concerns about digital echo chambers. A University of Illinois study found that “differing beliefs were associated with different realities,” identifying political misinformation as a key factor in the breakdown of marriages and long-term relationships in the US.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

New theory suggests AI may never be conscious without ‘biological’ chips

The debate over whether Artificial Intelligence can ever truly be conscious has…

You can swear by it: Turning the air blue makes you stronger, psychologists find

Unleashing a string of expletives might be the secret to hitting a…

Super Mario Bros. prescribed as ‘potent antidote’ for adults suffering burnout

Replaying familiar video games like Super Mario Bros. and Yoshi may help…

‘Feral’ AI chatbots are spreading shame and destroying reputations

Artificial intelligence is evolving into a “feral” gossip machine capable of ruining…

AI fuels boom in scientific papers but floods journals with ‘mediocre’ research

Artificial intelligence is helping scientists write papers faster than ever before, but…

New AI personality test reveals chatbots can be programmed with ‘psychosis’

Researchers have developed the first scientifically validated framework to measure the “personality”…