AI
DeepMind's Genie 2: Generating Interactive 3D Worlds Like Games
2024-12-04
DeepMind, the AI research arm of Google, has made a remarkable breakthrough by presenting a model capable of creating an "endless" variety of playable 3D worlds. This model, named Genie 2, is the successor to DeepMind's earlier Genie released this year. It has the astonishing ability to generate an interactive, real-time scene from a single image and text description, such as "A cute humanoid robot in the woods". In this regard, it shares similarities with models being developed by Fei-Fei Li's company, World Labs, and the Israeli startup Decart.
DeepMind's Genie 2: Unleashing the Potential of 3D Worlds
Model's Training and Capabilities
Trained on videos, Genie 2 can simulate object interactions, animations, lighting, physics, reflections, and the behavior of "NPCs". Many of its simulations resemble AAA video games, and the reason might be that its training data includes playthroughs of popular titles. However, like many AI labs, DeepMind has not disclosed many details about its data sourcing methods due to competitive or other reasons.This model can generate a "vast diversity of rich 3D worlds", and users can take actions like jumping and swimming by using a mouse or keyboard. It can generate consistent worlds with different perspectives, such as first-person and isometric views, for up to a minute, with the majority lasting 10-20 seconds. Genie 2 responds intelligently to actions taken by pressing keys on a keyboard, identifying the character and moving it correctly. For example, it can figure out that arrow keys should move a robot and not trees or clouds.Comparison with Other Models
Most models like Genie 2, which are world models, can simulate games and 3D environments but often face issues such as artifacting, consistency, and hallucination. For instance, Decart's Minecraft simulator, Oasis, has a low resolution and quickly "forgets" the layout of levels. However, Genie 2 can remember parts of a simulated scene that are not in view and render them accurately when they become visible again. World Labs' models also possess this ability.Although games created with Genie 2 might not be overly fun as they erase progress every minute or so, DeepMind positions the model as a research and creative tool. It can turn concept art and drawings into fully interactive environments and help researchers generate evaluation tasks that agents have not seen during training.Future Implications and Research Focus
While Genie 2 is still in the early stages, DeepMind believes it will be a key component in developing AI agents of the future. Google has been pouring increasing resources into world model research, which is expected to be the next big thing in generative AI. In October, DeepMind hired Tim Brooks, who was leading the development of OpenAI's Sora video generator, to work on video generation technologies and world simulators. Two years ago, the lab also poached Tim Rocktäschel, known for his "open-endedness" experiments with video games like Nethack, from Meta.This research holds great potential for various fields, from game development to artificial intelligence. It opens up new possibilities for creating immersive and interactive experiences and evaluating AI agents in diverse environments. As DeepMind continues to advance this technology, it will likely have a significant impact on the future of both the gaming and AI industries.