Google DeepMind has released the Gemini 2.5 Computer Use model, a specialised system built on Gemini 2.5 Pro that enables AI agents to interact with user interfaces by clicking, typing, and scrolling through web pages and applications, while outperforming leading alternatives on multiple benchmarks with lower latency.
The model, available in public preview via the Gemini API in Google AI Studio and Vertex AI, operates within a loop that analyses screenshots, user requests and action histories to generate responses representing UI actions. The system requires end-user confirmation for certain actions, including purchases, and demonstrates strong performance on web browsers with promise for mobile UI control tasks, though it is not yet optimised for desktop operating system-level control.
Google teams have deployed the model to production for UI testing, with versions powering Project Mariner, the Firebase Testing Agent and some agentic capabilities in AI Mode in Search. The model’s core capabilities are exposed through the new computer_use tool in the Gemini API.
Google DeepMind has implemented safety features directly into the model to address three key risks, including intentional misuse by users, unexpected model behaviour and prompt injections in web environments. Developers receive safety controls, including a per-step safety service that assesses each proposed action before execution and system instructions, allowing agents to refuse or request confirmation for high-stakes actions.
Early access programme users have tested the model for personal assistants, workflow automation and UI testing. Poke.com, a proactive AI assistant in iMessage, WhatsApp and SMS, reported: “A lot of our workflows require interacting with interfaces meant for humans where speed is especially important. Gemini 2.5 Computer Use is far ahead of the competition, often being 50% faster and better than the next best solutions we’ve considered.”
Google’s payments platform team implemented the Computer Use model as a contingency mechanism for fragile end-to-end UI tests that contributed to 25% of all test failures. The team stated: “When conventional scripts encounter failures, the model assesses the current screen state and autonomously ascertains the required actions to complete the workflow. This implementation now successfully rehabilitates over 60% of executions (which used to take multiple days to fix).”
The model demonstrates leading quality for browser control at the lowest latency as measured by performance on the Browserbase harness for Online-Mind2Web. Autotab, a drop-in AI agent, reported the model increased performance by up to 18% on their hardest evaluations.