Google SIMA (Scalable Instructable Multiworld Agent) is a generative AI agent developed by Google DeepMind. Google SIMA is trained on virtual 3D environments to accomplish basic gaming tasks by following user-given natural language instructions.
According to Google, the end goal is to make SIMA agents more advanced so that they can safely perform complicated tasks inside video games and, potentially, in the real world.
Further sections explain how SIMA works and its use cases, as given below.
- How Does SIMA Work?
- How was SIMA Trained?
- Using SIMA as an AI Agent in Games
- Can SIMA Play Games with You?
- Future Prospects of SIMA
How Does SIMA Work?
SIMA AI works by taking two simple inputs: (game) image processing and player-provided real-time language commands. This Google AI needs no permission to perform a range of tasks like modifying the game source codes, APIs, or special privileges to function as a gaming assistant.
The current version of SIMA is evaluated across 600 basic skills, spanning navigation (e.g. “turn left”), object interaction (“climb the ladder”), and menu use (“open the map”). We’ve trained SIMA to perform simple tasks that can be completed within about 10 seconds.
Google DeepMind
At its core, SIMA has two models: one for image-language mapping and the other for video prediction. The former AI model primarily assists SIMA in understanding natural language commands and their relation to the on-screen visual content. The video model helps SIMA predict future events so that it can plan its gaming actions upfront.
With both these AI models, SIMA connects visual observations and language instructions to perform short gaming tasks (<10 seconds) with keyboard and mouse actions.
Having SIMA by your side will be almost similar to having a human play on your behalf. However, this AI gaming tool isn’t about “achieving high game scores” as of now. The intent is to slowly introduce artificial intelligence to gaming and move to the complications of virtual environments at the later stages.
How was SIMA Trained?
SIMA was trained on nine different commercial video games, such as No Man’s Sky by Hello Games, Teardown by Tuxedo Labs, and Valheim by Iron Gate. Developers choose to train SIMA on open word and sandboxed games to help the AI learn a wide range of elementary gaming skills, including navigating, shooting, digging, driving, crafting, etc.
SIMA’s training was focused on first or third-person gameplay while avoiding games featuring extreme violence. Besides, the chosen commercial games for training featured varying environments but had distinct, in-depth gaming mechanics of their own.
In addition to games-based training, Google used 4 AI gaming environments with different procedurally-generated challenges to check SIMA’s skills in object handling and its general perception of the physical world in a controlled setting.
A vital piece of SIMA’s training is behavioral cloning, where AI agents learn by observing expert gamer-generated data. This dataset included gameplay videos, instructions, annotations, and more.
Using SIMA as an AI Agent in Games
SIMA is currently undergoing research, and using it as an AI agent isn’t for excelling in gaming leaderboards as it is still at a very elementary stage, i.e., moving, picking up tools, boarding a vehicle, etc. In fact, it’s not yet available for public beta testing, and there is no option to deploy Google SIMA at an individual or commercial gaming level.
However, SIMA’s integration would need little technical skills when released. This gaming AI takes only two inputs: visuals and language instructions, without any root-level access. It’s almost similar to how humans interact with games, confirming SIMA’s widespread possibilities among gamers.
Can SIMA Play Games with You?
Yes, theoretically, SIMA should play games with you when commercially available.
While it might not play at par with expert-level human players, gamers should benefit from SIMA’s abilities to perform basic gaming tasks or re-play fractions whenever necessary.
Future Prospects of SIMA
Developers evaluated SIMA for 1,485 unique gaming tasks across nine skill categories, including movement, simple navigation, resource gathering, object manipulation, and more.
The results of this early-stage research study suggest commendable success rates for such instructable multi-world agents in doing simpler things in these virtual worlds. For example, SIMA demonstrated a broad range of skills, such as basic navigation and object interaction, even when the target is not in immediate view. This signals an intuitive understanding of these training environments, which puts SIMA far ahead of large language models such as ChatGPT in learning and matching human performance.
However, further training is needed for these gaming AI agents to complete complex interactions.
Compared to humans, SIMA performed reasonably well. For instance, SIMA succeeded 34% of the time, whereas humans aced 60% of the same subset of tasks from the game No Man’s Sky.
The training also confirms that SIMA can perform well as a generalized AI agent. That means developers need not train SIMA on every other game. This was validated in their technical report, where AI agents trained on multiple games outperformed agents trained on a single game. Even agents for unseen games (i.e., games with which they have no prior training) performed closely to the environment-specialized agents.
In our evaluations, SIMA agents trained on a set of nine 3D games from our portfolio significantly outperformed all specialized agents trained solely on each individual one.
SIMA Developers
Consequently, this can translate into a future where Google may release a single, universal SIMA agent for all sorts of games or, at the very least, SIMA agents for specific gaming genres. In such cases, it can be a subscription-based service that users can simply integrate with games.
In addition, Google can partner with game studios to bundle SIMA to launch special, AI-assisted versions of popular games.
However, Google conveyed the intent to develop AI systems that are helpful not only in virtual 3D environments, such as games, but also in real, physical life.
As a wild guess, I predict the ultimate SIMA application would be assistive robots that can help us with our daily chores.