Last year marked the introduction of SIMA (Scalable Instructable Multiworld Agent), a generalist AI capable of following basic instructions across diverse virtual environments. It represented a crucial first step in teaching AI to translate language into meaningful actions within rich 3D worlds. Today, we’re excited to present SIMA 2, the next milestone in creating general and helpful AI agents. By integrating the advanced capabilities of Gemini models, SIMA has evolved from a simple instruction-follower into an interactive gaming companion. SIMA 2 not only follows human-language instructions in virtual worlds but also thinks about its goals, converses with users, and improves over time—a significant stride toward Artificial General Intelligence (AGI) with implications for robotics and AI embodiment.
The first version of SIMA mastered over 600 language-following skills, such as “turn left,” “climb the ladder,” and “open the map,” across various commercial video games. It operated by “looking” at the screen and using virtual keyboard and mouse inputs, without accessing underlying game mechanics. With SIMA 2, we’ve moved beyond mere instruction-following. By embedding a Gemini model at its core, SIMA 2 can think and reason about instructions, enabling it to understand high-level goals, perform complex reasoning, and skillfully execute goal-oriented actions within games.
Training SIMA 2 involved a mix of human demonstration videos with language labels and Gemini-generated labels. As a result, the agent can now describe its intended actions and detail the steps it’s taking to achieve goals. In testing, interacting with SIMA 2 feels less like issuing commands and more like collaborating with a reasoning companion. Thanks to partnerships with game developers, SIMA 2 has been trained and evaluated on a wider array of games, showcasing the power of Gemini in embodied AI—a world-class reasoning engine that perceives, understands, and acts in complex 3D environments.
A leap in generalization performance has been achieved with Gemini’s integration. SIMA 2 understands more complex and nuanced instructions than its predecessor and succeeds in carrying them out, even in untrained scenarios or games like ASKA (a Viking survival game) and MineDojo (a Minecraft research implementation). Its ability to transfer learned concepts—such as applying “mining” knowledge from one game to “harvesting” in another—mirrors the broad generalization seen in human cognition, bringing its performance significantly closer to that of human players.
To test the limits of SIMA 2’s generalization, we combined it with Genie 3, a project that generates new 3D worlds from images or text prompts. In these newly created environments, SIMA 2 oriented itself, understood user instructions, and took meaningful actions without prior exposure, demonstrating unprecedented adaptability.
One of SIMA 2’s most exciting capabilities is self-improvement. Through trial-and-error and Gemini-based feedback, the agent can perform increasingly complex tasks. After initial learning from human demonstrations, it transitions to self-directed play in new games, developing skills without additional human data. This experience data then trains more capable versions of the agent, even in Genie environments—a milestone for general agents in diverse, generated worlds.
This virtuous cycle of iterative improvement paves the way for agents to learn and grow with minimal human intervention, becoming open-ended learners in embodied AI. SIMA 2’s ability to operate across diverse gaming environments serves as a proving ground for general intelligence, enabling skill mastery, complex reasoning, and continuous learning through self-directed play.
While SIMA 2 is a significant step toward generalist, interactive, embodied intelligence, it remains a research endeavor with limitations. Challenges include very long-horizon tasks requiring multi-step reasoning, limited memory for low-latency interaction, and precise execution via keyboard and mouse interfaces. Nonetheless, this research validates a new path in action-oriented AI, unifying specialized capabilities into a coherent generalist agent and offering strong potential for robotics applications, from navigation to collaborative task execution.
Responsible development is at the forefront of SIMA 2’s journey. As an interactive, human-centered agent, it engages users by explaining its reasoning in an entertaining way. We’re committed to developing it responsibly, particularly regarding self-improvement innovations. SIMA 2 is available as a limited research preview to academics and game developers, allowing us to gather feedback and address risks collaboratively.