SEARCH
SHARE IT
Elon Musk’s artificial intelligence venture, xAI, has officially launched Grok 4, its newest flagship AI model that aims to set a new benchmark in the industry. Announced on July 10, 2025, Grok 4 is being positioned as a top-performing multimodal AI system, capable of handling a wide range of academic, mathematical, and reasoning tasks with a high degree of precision. xAI claims that this model not only rivals but surpasses leading competitors such as Google’s Gemini 2.5 Pro and OpenAI’s o3 series.
At the core of Grok 4’s capabilities lies its exceptional performance on standardized tests and reasoning benchmarks. In the challenging Humanity’s Last Exam (HLE), Grok 4, without the assistance of additional tools, scored 25.4%. This result places it ahead of Google Gemini 2.5 Pro at 21.6% and OpenAI’s o3-high at 21%. However, when enhanced with multi-agent and tool-based capabilities, the advanced Grok 4 Heavy variant achieved a significant leap, reaching 44.4% on the same benchmark. For comparison, Gemini 2.5 Pro with tool support managed just 26.9%.
On the ARC-AGI-2 benchmark, which tests for abstract reasoning and general intelligence, Grok 4 earned a score of 16.2%, nearly double that of its nearest competitor, Claude Opus 4. The model also demonstrated robust performance on MMLU-style evaluations, where it secured an 86.6% score and an overall Intelligence Index of 73—currently the highest in the industry, surpassing both OpenAI and Google’s top offerings.
In STEM-related and coding assessments, Grok 4 continued to impress. It posted an 87.5% result on the GPQA benchmark, while Grok 4 Heavy pushed that figure to 88.9%. On the AIME (American Invitational Mathematics Examination), the heavy variant of the model reached a flawless 100% score, with the base version closely trailing at 98.8%. Additionally, a specialized coding version, Grok 4 Code, is set to launch in August 2025 and has already achieved between 72% and 75% on the SWE-Bench, a performance evaluation specifically for software engineering tasks.
According to independent metric provider Artificial Analysis, Grok 4 has attained an Intelligence Index score of 73, which places it ahead of OpenAI’s o3 at 70 and Google’s Gemini 2.5 Pro, also at 70. Musk emphasized the reliability of Grok 4 in complex scientific fields like mathematics and physics, stating that the model now virtually never makes errors on exam-style questions—unless those questions are deliberately designed to be adversarial. He added that Grok 4 has the capability to detect and resolve ambiguities or mistakes in exam questions and provide accurate answers even when the queries are flawed.
In terms of accessibility, xAI is maintaining the same pricing structure for Grok 4’s API as it did for Grok 3. This includes a rate of $3 per million input tokens and $15 per million output tokens, with cached input tokens priced at $0.75 per million. This competitive pricing suggests xAI is looking to attract developers and businesses that require high-performance AI without a drastic increase in cost.
For individual users, xAI has introduced new subscription models under the Grok brand. The basic free tier offers limited access to Grok 3, catering to casual users and those just beginning to explore generative AI. The “SuperGrok” subscription, priced at $30 per month, provides extended access to both Grok 3 and Grok 4. Meanwhile, professionals and power users can opt for the $300 per month “SuperGrok Heavy” plan, which unlocks full capabilities of Grok 4 Heavy, along with access to the standard Grok 4 and Grok 3.
This release represents a major milestone for xAI as it continues its mission to develop competitive alternatives to AI models produced by Google, OpenAI, and Anthropic. The company’s push into multi-agent systems, advanced reasoning, and academic benchmark optimization reflects its ambition to lead not just in performance, but in the future direction of artificial intelligence.
As xAI rolls out Grok 4 to more users, it remains to be seen how the model will perform in real-world scenarios beyond benchmark testing. However, with Musk’s public endorsements and the technical specifications provided so far, Grok 4 is poised to challenge the status quo in AI development and offer developers, researchers, and businesses a compelling new option in the generative AI landscape.
MORE NEWS FOR YOU