Introducing the Gemini 2.5 Computer Use model

Google DeepMind has officially launched the Gemini 2.5 Computer Use model, a specialized AI tool built on the powerful Gemini 2.5 Pro framework. This model enables developers to create intelligent agents that can interact with user interfaces just like humans—clicking, typing, and scrolling through web and mobile applications. Available now in public preview via the Gemini API on Google AI Studio and Vertex AI, it outperforms leading alternatives in web and mobile control benchmarks with significantly lower latency, making it a game-changer for automation tasks.

While many AI models rely on structured APIs, numerous digital tasks—such as filling out forms or navigating dropdown menus—require direct interaction with graphical interfaces. The Gemini 2.5 Computer Use model addresses this gap by allowing agents to handle these actions natively, including operating behind logins and manipulating interactive elements. This capability is a critical step toward building versatile, general-purpose AI agents that can streamline workflows and enhance productivity.

How It Works

The model operates through the new computer_use tool in the Gemini API, functioning within a continuous loop. Inputs include the user’s request, a screenshot of the current environment, and a history of recent actions. Developers can customize the tool by excluding certain UI actions or adding custom functions. The model analyzes these inputs and generates responses—typically function calls for actions like clicking or typing—and may request user confirmation for sensitive tasks, such as purchases. After execution, a new screenshot and URL are fed back into the loop, repeating until the task is complete or terminated.

Primarily optimized for web browsers, the model also shows strong potential for mobile UI control, though it is not yet designed for desktop OS-level tasks. To see it in action, consider these demo prompts: one involves extracting pet details from a signup form and scheduling a spa appointment, while another organizes sticky notes into categories on a collaborative board—all performed at accelerated speeds for demonstration.

Performance and Benchmarks

The Gemini 2.5 Computer Use model excels in multiple web and mobile control benchmarks, as validated by self-reported data, Browserbase evaluations, and internal tests. It delivers top-tier quality for browser control with the lowest latency, ensuring high accuracy without compromising speed.

Safety and Responsibility

Building safe AI agents is paramount, especially when they control computers. The model incorporates built-in safety features to mitigate risks like misuse, unexpected behavior, and prompt injections. Developers are equipped with additional controls, including a per-step safety service that evaluates each proposed action before execution and system instructions that can require user confirmation for high-stakes tasks. Comprehensive documentation offers best practices, though thorough testing is encouraged before deployment.

Early Adoption and Use Cases

Google teams have already integrated the model into production for UI testing, accelerating software development. It also powers projects like Project Mariner, the Firebase Testing Agent, and AI Mode in Search. Early access users have successfully applied it to personal assistants, workflow automation, and UI testing, reporting strong results and enhanced efficiency.

Getting Started

The Gemini 2.5 Computer Use model is now accessible in public preview. Developers can try it in a demo environment hosted by Browserbase, explore reference materials and documentation, and build custom agent loops using tools like Playwright or cloud VMs. Join the community in the Developer Forum to share feedback and help shape the future of this innovative technology.