xAI adds Vision and Voice to Grok AI chatbot

SHARE IT

24 April 2025

xAI, the artificial intelligence company founded by Elon Musk, has announced a significant update to its Grok chatbot, introducing a new Voice Mode alongside a powerful Grok Vision feature. This update marks a major step forward in making Grok a more interactive and context-aware assistant, joining the growing list of AI models that blend text, voice, and visual understanding—such as OpenAI’s ChatGPT and Google’s Gemini.

With Grok Vision, users can now engage with their surroundings in a more intuitive way. iPhone users, for example, can point their camera at an object and ask, “What am I looking at?”—prompting Grok to deliver a real-time, voice-based explanation. The feature leverages computer vision to interpret scenes through the phone’s camera and respond with contextual, spoken insights.

The tool is currently available through the Grok app on iOS, though Android users will need to wait for a future update. The company has not specified an exact release date for Android availability, but it is expected to follow soon.

Beyond image recognition, Voice Mode introduces multilingual audio support, enabling users to hold conversations with Grok in multiple languages. This expansion caters to a more global audience and reflects the increasing demand for AI tools that offer seamless, natural interaction across language barriers.

A standout component of this update is the real-time search integration. With it, Grok can access and deliver up-to-date information almost instantly, enhancing its ability to answer queries with current and relevant data. This addition gives Grok a more dynamic edge compared to AI models that rely solely on pre-trained knowledge or static datasets.

The launch follows closely on the heels of another notable enhancement—memory capability. Introduced just a week prior, this feature allows Grok to remember past interactions, including user preferences, previous questions, and personal context. The aim is to create more personalized and adaptive responses, offering suggestions and content that align more closely with the user’s history and interests.

In tandem with the Voice and Vision updates, xAI also unveiled Studio, the first iteration of a dedicated content creation space for users. Similar in concept to ChatGPT’s Canvas, Studio offers a clean, focused interface for generating documents, code, and other content. It opens in a separate window and is designed to be a more structured environment for users working on extended tasks or complex projects.

Altogether, these updates signal xAI’s broader push to transform Grok into a fully multimodal AI assistant—capable of seeing, speaking, remembering, and understanding complex user needs across a variety of formats. With the introduction of Grok Vision and expanded Voice Mode features, xAI positions itself more competitively in the rapidly evolving AI assistant landscape, appealing to users who demand real-time, intelligent interaction that bridges the digital and physical worlds.

As the AI arms race intensifies, innovations like Grok’s new voice and visual functionalities are setting the stage for smarter, more interactive personal assistants. Whether this will be enough for Grok to compete head-to-head with titans like ChatGPT or Gemini remains to be seen—but one thing is clear: xAI is not holding back.

View them all

xAI adds Vision and Voice to Grok AI chatbot

The inconvenient truth about personal devices at work

TikTok Pro launches in Europe

YouTube experiments with Creator Collabs like TikTok and Instagram