Google

Google DeepMind has released the Gemini 2.5 Computer Use model, a specialised system built on Gemini 2.5 Pro that enables AI agents to interact with user interfaces by clicking, typing, and scrolling through web pages and applications, while outperforming leading alternatives on multiple benchmarks with lower latency.

The model, available in public preview via the Gemini API in Google AI Studio and Vertex AI, operates within a loop that analyses screenshots, user requests and action histories to generate responses representing UI actions. The system requires end-user confirmation for certain actions, including purchases, and demonstrates strong performance on web browsers with promise for mobile UI control tasks, though it is not yet optimised for desktop operating system-level control.

Google teams have deployed the model to production for UI testing, with versions powering Project Mariner, the Firebase Testing Agent and some agentic capabilities in AI Mode in Search. The model’s core capabilities are exposed through the new computer_use tool in the Gemini API.

Google DeepMind has implemented safety features directly into the model to address three key risks, including intentional misuse by users, unexpected model behaviour and prompt injections in web environments. Developers receive safety controls, including a per-step safety service that assesses each proposed action before execution and system instructions, allowing agents to refuse or request confirmation for high-stakes actions.

Early access programme users have tested the model for personal assistants, workflow automation and UI testing. Poke.com, a proactive AI assistant in iMessage, WhatsApp and SMS, reported: “A lot of our workflows require interacting with interfaces meant for humans where speed is especially important. Gemini 2.5 Computer Use is far ahead of the competition, often being 50% faster and better than the next best solutions we’ve considered.”

Google’s payments platform team implemented the Computer Use model as a contingency mechanism for fragile end-to-end UI tests that contributed to 25% of all test failures. The team stated: “When conventional scripts encounter failures, the model assesses the current screen state and autonomously ascertains the required actions to complete the workflow. This implementation now successfully rehabilitates over 60% of executions (which used to take multiple days to fix).”

The model demonstrates leading quality for browser control at the lowest latency as measured by performance on the Browserbase harness for Online-Mind2Web. Autotab, a drop-in AI agent, reported the model increased performance by up to 18% on their hardest evaluations.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

Political misinformation key reason for US divorces and breakups, study finds

Political misinformation or disinformation was the key reason for some US couples’…

Wikimedia launches free AI vector database to challenge Big Tech dominance

Wikimedia Deutschland has launched a free vector database enabling developers to build…

Meta launches ad-free subscriptions after ICO forces compliance changes

Meta will offer UK users paid subscriptions to use Facebook and Instagram…

Walmart continues developer hiring while expanding AI agent automation

Walmart will continue hiring software engineers despite deploying more than 200 AI…

Film union condemns AI actor as threat to human performers’ livelihoods

SAG-AFTRA has condemned AI-generated performer Tilly Norwood as a synthetic character trained…

Anthropic’s Claude Sonnet 4.5 detects testing scenarios, raising evaluation concerns

Anthropic’s latest AI model recognised it was being tested during safety evaluations,…

Mistral targets enterprise data as public AI training resources dry up

Europe’s leading artificial intelligence startup Mistral AI is turning to proprietary enterprise…

Majority of TikTok health videos spread medical misinformation to parents

Most medical and parenting videos shared on TikTok by non-medical professionals contain…