Google

Google DeepMind has released the Gemini 2.5 Computer Use model, a specialised system built on Gemini 2.5 Pro that enables AI agents to interact with user interfaces by clicking, typing, and scrolling through web pages and applications, while outperforming leading alternatives on multiple benchmarks with lower latency.

The model, available in public preview via the Gemini API in Google AI Studio and Vertex AI, operates within a loop that analyses screenshots, user requests and action histories to generate responses representing UI actions. The system requires end-user confirmation for certain actions, including purchases, and demonstrates strong performance on web browsers with promise for mobile UI control tasks, though it is not yet optimised for desktop operating system-level control.

Google teams have deployed the model to production for UI testing, with versions powering Project Mariner, the Firebase Testing Agent and some agentic capabilities in AI Mode in Search. The model’s core capabilities are exposed through the new computer_use tool in the Gemini API.

Google DeepMind has implemented safety features directly into the model to address three key risks, including intentional misuse by users, unexpected model behaviour and prompt injections in web environments. Developers receive safety controls, including a per-step safety service that assesses each proposed action before execution and system instructions, allowing agents to refuse or request confirmation for high-stakes actions.

Early access programme users have tested the model for personal assistants, workflow automation and UI testing. Poke.com, a proactive AI assistant in iMessage, WhatsApp and SMS, reported: “A lot of our workflows require interacting with interfaces meant for humans where speed is especially important. Gemini 2.5 Computer Use is far ahead of the competition, often being 50% faster and better than the next best solutions we’ve considered.”

Google’s payments platform team implemented the Computer Use model as a contingency mechanism for fragile end-to-end UI tests that contributed to 25% of all test failures. The team stated: “When conventional scripts encounter failures, the model assesses the current screen state and autonomously ascertains the required actions to complete the workflow. This implementation now successfully rehabilitates over 60% of executions (which used to take multiple days to fix).”

The model demonstrates leading quality for browser control at the lowest latency as measured by performance on the Browserbase harness for Online-Mind2Web. Autotab, a drop-in AI agent, reported the model increased performance by up to 18% on their hardest evaluations.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

Alarming new US survey shows half of patients rely on AI for medical choices

Across the United States, a dangerous new trend is emerging. Millions of…

Why digital tears and online outrage fail to win modern political arguments

Scrolling through your social media feed today often feels like navigating a…

Global gambling firms rush to adopt AI despite severe lack of safety controls

The global gambling industry is racing to integrate artificial intelligence into its…

Students prefer artificial intelligence until they figure out it is a machine

University students prefer to get academic advice from artificial intelligence rather than…

Tracking how war and energy policies dimmed night lights of Europe

While human civilisation is glowing brighter than ever before, the lights across…

Massive AI study uncovers the secret GLP-1 side effects hidden on Reddit

Millions of patients are flocking to GLP-1 weight loss injections, but artificial…