SEARCH
SHARE IT
Google DeepMind has made continuous progress in the field of artificial intelligence, with monthly upgrades to Gemini, Imagen, Veo, Gemma, and AlphaFold. Google DeepMind has entered the robotics business with two new Gemini 2.0-based models: Gemini Robotics and Gemini Robotics-ER.
Gemini Robotics is an improved vision-language-action (VLA) model based on Gemini 2.0 that incorporates physical actions as a new output modality for controlling robots. Google believes that this new model can understand scenarios that it has never encountered before during training.
Gemini Robotics outperforms other cutting-edge vision-language-action models by twice as much on a thorough generalisation benchmark. Because Gemini Robotics is based on the Gemini 2.0 concept, it has natural language comprehension capabilities across multiple languages. As a result, it can more accurately grasp people's commands.
In terms of dexterity, Google believes that Gemini Robotics can perform exceedingly complicated, multi-step tasks requiring precise manipulation. For example, this model can fold origami and place a snack in a Ziploc bag.
Gemini Robotics-ER is an enhanced vision-language model that focusses on spatial thinking and can be integrated with roboticists' existing low-level controllers. This model provides roboticists with all of the procedures necessary to control a robot right out of the box, including perception, state estimates, spatial comprehension, planning, and code creation.
Google is collaborating with Apptronik to develop humanoid robots based on the Gemini 2.0 models. Google is also collaborating with trusted testers such as Agile Robots, Agility Robotics, Boston Dynamics, and Enchanted Tools on the future of Gemini Robotics-ER.
Google DeepMind is paving the road for a future in which robots can effortlessly blend into all facets of our life.
MORE NEWS FOR YOU