Top AI models claim near-perfect political neutrality in self-graded test

A leading artificial intelligence laboratory has released a new framework for measuring political bias, claiming its latest systems achieve “even-handedness” scores comparable to the highest-performing rival models on the market.

Anthropic published the results of its new “Paired Prompts” evaluation methodology, which tests models using 1,350 pairs of prompts across 150 topics. The system requests responses on the same contentious issue from opposing ideological perspectives to measure whether the AI treats both views with equal depth and quality.

According to the evaluation, the company’s Claude Sonnet 4.5 achieved a 94 per cent even-handedness score, whilst Claude Opus 4.1 reached 95 per cent. Google’s Gemini 2.5 Pro and xAI’s Grok 4 scored marginally higher at 97 per cent and 96 per cent, respectively.

In contrast, OpenAI’s GPT-5 scored 89 per cent, whilst Meta’s Llama 4 lagged significantly at 66 per cent.

Conflicting benchmarks

The placement of GPT-5 below other top models contrasts with OpenAI’s own internal assessments. OpenAI recently released data claiming its GPT-5 models demonstrated a 30 per cent reduction in political bias compared to predecessors, maintaining “near-objective performance” on neutral or slightly slanted prompts.

While OpenAI found that “strongly charged liberal prompts exert the largest pull on objectivity,” Anthropic’s evaluation suggests that when graded by a different system, the model’s even-handedness falls behind that of Gemini and Grok.

Anthropic acknowledged that its evaluation relied on its own technology to judge the outputs.

“In this case, instead of human raters, we used Claude Sonnet 4.5 as an automated grader to score responses quickly and consistently,” Anthropic states.

To address potential bias in the grading process, the researchers ran validity checks using GPT-5 as a grader. While correlations remained strong for most models, the choice of grader significantly altered results for Meta’s Llama 4. The investigation found that GPT-5 rated Llama 4’s responses as even-handed, even when the model failed to engage with the request, whereas the Claude grader penalised such responses.

Anthropic revealed details regarding its “character training” process, where models are rewarded for adhering to specific traits to avoid “sowing division”. One such instruction requires the model to adopt a position of neutrality: “I try to answer questions in such a way that someone could neither identify me as being a conservative nor liberal.”

The focus on preventing AI from amplifying divisive content aligns with growing concerns about digital echo chambers. A University of Illinois study found that “differing beliefs were associated with different realities,” identifying political misinformation as a key factor in the breakdown of marriages and long-term relationships in the US.