On Tuesday, Google introduced Gemini 2.5, the latest and most sophisticated addition to its AI reasoning model lineup, designed to enhance problem-solving by pausing to “think” before responding.
To mark this launch, the company is rolling out Gemini 2.5 Pro Experimental, a cutting-edge multimodal AI reasoning model that Google touts as its most intelligent yet. This model will be accessible starting Tuesday through Google AI Studio, as well as in the Gemini app for users subscribed to Gemini Advanced, the company’s $20-per-month AI plan.
Looking ahead, Google has stated that all its future AI models will feature built-in reasoning capabilities, reinforcing its commitment to advancing AI decision-making.
The Competitive Landscape of AI Reasoning
Since OpenAI first introduced AI reasoning models in September 2024 with “o1,” the race among tech companies to push the limits of AI capabilities has intensified. Today, major players such as Anthropic, DeepSeek, Google, and xAI have developed their own AI reasoning models, which leverage additional computational power and time to enhance accuracy and problem-solving.
These reasoning advancements have significantly improved AI performance in complex fields like mathematics and programming. Many experts believe that reasoning-driven AI models will be crucial for the next generation of AI agents—autonomous systems capable of executing tasks with minimal human involvement. However, the added computational demands make these models more costly to operate.
Google’s Push for AI Superiority
Google has been experimenting with reasoning-based AI for some time, first introducing a “thinking” variant of Gemini in December. However, Gemini 2.5 marks the company’s most determined effort to rival OpenAI’s “o” series models.
According to Google, Gemini 2.5 Pro surpasses its previous AI models and some of its competitors’ offerings across multiple benchmarks. Specifically, it is engineered to excel in visually compelling web app creation and agent-based coding applications.
One key evaluation, Aider Polyglot, which measures code editing proficiency, saw Gemini 2.5 Pro achieving a 68.6% score, outperforming leading AI models from OpenAI, Anthropic, and DeepSeek. However, in the SWE-bench Verified test, which assesses software development capabilities, Gemini 2.5 Pro scored 63.8%. This placed it ahead of OpenAI’s “o3-mini” and DeepSeek’s “R1” but behind Anthropic’s “Claude 3.7 Sonnet,” which scored 70.3%.
Meanwhile, on the Humanity’s Last Exam—a comprehensive multimodal test covering mathematics, humanities, and natural sciences—Gemini 2.5 Pro scored 18.8%, outperforming most rival flagship models.
Unprecedented Context Window and Future Expansion
One of Gemini 2.5 Pro’s standout features is its ability to process a 1 million token context window, allowing it to analyze roughly 750,000 words at once—longer than the entire Lord of the Rings trilogy. Google also announced plans to expand this capability to 2 million tokens in the near future.
As for API pricing, Google has not yet disclosed details but has promised to share more information in the coming weeks.
With Gemini 2.5, Google is positioning itself as a dominant force in AI reasoning, pushing the boundaries of what’s possible with artificial intelligence. The ongoing competition among AI leaders ensures that innovation in this field will continue at a rapid pace.