We’re entering an era where AI agents can truly interact with the physical world, enabling robots to perceive, plan, think, use tools, and act to solve complex, multi-step tasks. Today marks a significant step forward in advancing intelligent, general-purpose robots with the introduction of two groundbreaking models that unlock agentic experiences through advanced thinking capabilities.
Gemini Robotics 1.5 represents our most capable vision-language-action (VLA) model, transforming visual information and instructions into precise motor commands for task execution. What sets this model apart is its ability to think before taking action, demonstrating its reasoning process transparently. This helps robots assess and complete complex tasks more effectively while accelerating skill learning across different embodiments.
Gemini Robotics-ER 1.5 stands as our most advanced vision-language model (VLM), specializing in reasoning about the physical world. It natively calls digital tools and creates detailed, multi-step plans to complete missions, achieving state-of-the-art performance across spatial understanding benchmarks.
These innovations empower developers to build more capable and versatile robots that actively understand their environment to handle complex, multi-step tasks in a generalized manner. Starting today, Gemini Robotics-ER 1.5 is available to developers via the Gemini API in Google AI Studio, while Gemini Robotics 1.5 is currently accessible to select partners.
Most daily tasks require contextual information and multiple steps to complete, presenting significant challenges for current robotic systems. Consider a scenario where a robot needs to sort objects into compost, recycling, and trash bins based on local guidelines. This requires searching for relevant information online, identifying objects, applying sorting rules, and executing all necessary steps—exactly the type of complex task these new models are designed to handle.
The two models work in perfect harmony within an agentic framework. Gemini Robotics-ER 1.5 acts as the high-level brain, orchestrating robot activities with exceptional planning and logical decision-making capabilities. It features state-of-the-art spatial understanding, natural language interaction, progress estimation, and can natively call tools like Google Search or third-party functions.
Gemini Robotics-ER 1.5 then provides natural language instructions to Gemini Robotics 1.5 for each step, which uses its vision and language understanding to perform specific actions. This collaboration enables robots to think about their actions, solve semantically complex tasks, and even explain their reasoning processes in natural language—making their decisions more transparent and trustworthy.
Gemini Robotics-ER 1.5 represents the first thinking model optimized for embodied reasoning, achieving state-of-the-art performance on both academic and internal benchmarks. Evaluation across 15 academic benchmarks—including Embodied Reasoning Question Answering (ERQA) and Point-Bench—demonstrates superior performance in pointing, image question answering, and video question answering tasks.
Unlike traditional vision-language-action models that directly translate instructions into robot movements, Gemini Robotics 1.5 introduces the revolutionary capability to think before acting. It generates internal reasoning sequences in natural language to handle tasks requiring multiple steps or deeper semantic understanding.
For instance, when asked to “sort my laundry by color,” the robot engages in multi-level thinking: understanding that white clothes go in the white bin and colored items in the black bin, planning the sequence of picking up each garment, and considering detailed motions like positioning itself optimally for easier grasping.
During this multi-level thinking process, the model can break longer tasks into manageable segments, helping robots generalize to new tasks and adapt to environmental changes more effectively.
One of the most remarkable breakthroughs is Gemini Robotics 1.5’s ability to learn across different embodiments. Robots come in various shapes and sizes with different sensing capabilities and degrees of freedom, traditionally making motion transfer between robots extremely challenging. This model demonstrates unprecedented capability to transfer learned motions between different robot platforms without requiring specialization for each new embodiment.
For example, tasks presented only to the ALOHA 2 robot during training work seamlessly on Apptronik’s humanoid robot Apollo and the bi-arm Franka robot, and vice versa. This cross-embodiment learning accelerates behavior acquisition, making robots smarter and more useful across different platforms.
As we unlock the full potential of embodied AI, responsible development remains paramount. We’re proactively creating novel safety and alignment approaches to ensure agentic AI robots can be deployed responsibly in human-centric environments. Gemini Robotics 1.5 implements a comprehensive safety approach through high-level semantic reasoning, including safety considerations before action, respectful human interaction aligned with existing safety policies, and activation of low-level safety subsystems when needed.
To guide safe development, we’re releasing an upgraded ASIMOV benchmark with enhanced coverage, improved annotations, new safety question types, and additional video modalities. In safety evaluations, Gemini Robotics-ER 1.5 demonstrates state-of-the-art performance, with its thinking ability significantly contributing to better understanding of semantic safety and adherence to physical safety constraints.
Gemini Robotics 1.5 marks a crucial milestone toward solving Artificial General Intelligence in the physical world. By introducing agentic capabilities, we’re moving beyond reactive command systems to create robots that can truly reason, plan, actively use tools, and generalize across tasks. This represents a foundational step toward building robots that navigate physical world complexities with intelligence and dexterity, ultimately becoming more helpful and integrated into our daily lives.
We’re excited to continue this journey with the broader research community and eagerly anticipate the innovations the robotics community will create using our latest Gemini Robotics-ER model.