Anthropic has released Claude Sonnet 4.5, which the company claims is the best coding model in the world and the strongest model for building complex agents.
The company says Claude Sonnet 4.5 is state-of-the-art on the SWE-bench Verified evaluation, which measures real-world software coding abilities. The model has been observed maintaining focus for more than 30 hours on complex, multi-step tasks.
The model represents a significant leap forward on computer use, claims Anthropic. On OSWorld, a benchmark that tests AI models on real-world computer tasks, Sonnet 4.5 now leads at 61.4 per cent. Four months ago, Sonnet 4 held the lead at 42.2 per cent. The model also shows improved capabilities on a broad range of evaluations including reasoning and maths.
Anthropic says Claude Sonnet 4.5 is its most aligned frontier model yet. The company says it has substantially improved the model’s behaviour, reducing concerning behaviours like sycophancy, deception, power-seeking, and the tendency to encourage delusional thinking. For the model’s agentic and computer use capabilities, Anthropic says it has made progress on defending against prompt injection attacks.
The model is being released under Anthropic’s AI Safety Level 3 protections, which include filters called classifiers that aim to detect potentially dangerous inputs and outputs, particularly those related to chemical, biological, radiological, and nuclear weapons.
Anthropic is releasing the Claude Agent SDK, the same infrastructure that powers Claude Code. In Claude Code, the company has added checkpoints that save progress and allow users to roll back instantly to a previous state, refreshed the terminal interface and shipped a native VS Code extension.
Claude Sonnet 4.5 is available via the Claude API using claude-sonnet-4-5. Pricing remains the same as Claude Sonnet 4, at $3 per million input tokens and $15 per million output tokens. Anthropic is releasing a temporary research preview called “Imagine with Claude” available to Max subscribers for five days.