Google Gemini 2.5 Unveiled, Its “Best Yet” Family of AI Reasoning Models

On Tuesday, Google unveiled Gemini 2.5, a new set of AI reasoning models designed to take a moment to “think” before providing answers. Alongside this, the company also introduced Gemini 2.5 Pro Experimental, a multimodal reasoning model that it claims is the most advanced to date.

This model is now accessible on the Google AI Studio developer platform and through the Gemini app for subscribers of the $20-per-month Gemini Advanced plan.

Benchmark Performance

Google asserts that Gemini 2.5 Pro surpasses its previous frontier models and many competing systems across various benchmarks. For instance, in a code editing evaluation (Aider Polyglot), Gemini 2.5 Pro achieved a score of 68.6%, outperforming top models from OpenAI, Anthropic, and DeepSeek.

In another assessment (SWE-bench Verified) that evaluates software development skills, it received a score of 63.8%. While this score is higher than OpenAI’s o3-mini and DeepSeek’s R1, it does not match Anthropic’s Claude 3.7 Sonnet, which scored 70.3%.

In a multimodal test known as Humanity’s Last Exam, which features thousands of crowdsourced questions across mathematics, humanities, and natural sciences, Gemini 2.5 Pro scored 18.8%, outperforming most rival flagship models.

Expanded Context

Google emphasizes that Gemini 2.5 Pro has a context window of 1 million tokens, allowing it to process approximately 750,000 words simultaneously—longer than the entire “Lord of the Rings” series. The company plans to increase this capacity to 2 million tokens in the future.

While Google has yet to announce API pricing for Gemini 2.5 Pro, it promises to provide more information in the upcoming weeks.

For more daily updates, please visit our News Section.

Benchmark Performance

Expanded Context

Leave a Comment Cancel reply

Reach out to us for sponsorship opportunities