Google unveils Gemini 3 claiming dominance over GPT-5.1 in major AI benchmarks

Google unveils Gemini 3 claiming dominance over GPT-5.1 in major AI benchmarks

SHARE IT

21 November 2025


Google has officially pulled back the curtain on Gemini 3, the company’s latest flagship AI model and one of its most ambitious releases to date. After several weeks of teasers and speculation, Google DeepMind introduced the model publicly, presenting it as a significant leap forward in reasoning, multimodality, and overall performance across a broad range of industry-standard evaluations.

According to Google, Gemini 3 sets new records on multiple benchmark tests, placing it ahead of OpenAI’s GPT-5.1 and Anthropic’s latest Claude variant. The company emphasized that the model reached a top score of 1501 on the LMArena Leaderboard, a competitive metric used to gauge real-world conversational ability. It also delivered a score of 37.5 percent on Humanity’s Last Exam, a challenging assessment designed to evaluate complex reasoning, and secured 91.9 percent on the GPQA Diamond benchmark, which tests high-level problem solving and specialized knowledge.

In mathematical reasoning, Gemini 3 continues to push boundaries. On the MathArena Apex benchmark, it achieved a score of 23.4 percent, setting a new state-of-the-art result. Its multimodal strengths are even more pronounced. The Pro tier of Gemini 3 reached 81 percent on MMMU-Pro, a test that evaluates the model’s ability to analyze images and text in tandem, and 87.6 percent on Video-MMMU, where models must interpret moving visuals. The model also excelled on SimpleQA Verified, which focuses on factual accuracy, recording a leading score of 72.1 percent.

Google released a detailed comparison chart to illustrate how Gemini 3 Pro stacks up against GPT-5.1 and Claude Sonnet 4.5, underscoring its claim that the new model is now at the frontier of performance. But the company wasn’t done there.

Alongside the Pro version, Google introduced Gemini 3 Deep Think, a variant engineered for more intensive analytical tasks. Deep Think delivers even stronger results across the toughest benchmarks: 41 percent on Humanity’s Last Exam, 93.8 percent on GPQA Diamond, and 45.1 percent on ARC-AGI-2, a demanding evaluation that includes code execution and has been verified as part of the ARC Prize challenge. All the while, the Gemini 3 family maintains support for a one-million-token context window, allowing it to process vast amounts of information in a single session.

Perhaps just as notable as the model’s capabilities is the speed at which Google plans to make Gemini 3 available. In previous years, top-tier Gemini releases were slow to reach everyday users, often limited to research previews for extended periods. This time, Google is pushing for a rapid rollout. AI Mode in Google Search has already been upgraded to use Gemini 3, enabling new types of interactive and generative interfaces directly within search results. These include visual layouts, dynamic simulations, and tools that respond instantly to user queries.

Despite its strengths, Gemini 3 does not dominate every category. On SWE-bench Verified, a widely followed benchmark that evaluates a model’s programming and debugging skills, Gemini 3 Pro scored 76.2 percent. That places it just below GPT-5.1 and Claude Sonnet 4.5, both of which continue to lead in code-related tasks. Even so, Google notes that developers now have an unusually broad range of access points for working with Gemini 3. The model is already integrated into Google AI Studio, Vertex AI, Gemini CLI, Cursor, GitHub, JetBrains, Manus, Replit, and the company’s new agentic development platform, Google Antigravity.

General consumers will also encounter the model much sooner than in past releases. Gemini 3 is now available directly through the Gemini app, offering upgraded conversational and multimodal abilities for everyday use. Subscribers to Google AI Pro and Ultra can now access Gemini 3 inside AI Mode in Search, while enterprise customers will find the model deployed across Vertex AI and Gemini Enterprise. Google says that Gemini 3 Deep Think will roll out to AI Ultra subscribers in the coming weeks, completing what is likely the company’s most aggressive release schedule yet.

View them all