Skip to content
Back to Blog
Model Releases

Gemini 2.5 Pro (06-05): The New Frontier for Agentic AI

Google DeepMind releases Gemini 2.5 Pro Preview 06-05 with 1M token context, enhanced reasoning, and agentic capabilities. A technical breakdown for developers.

June 5, 2025
Model ReleaseGemini 2.5 Pro (06-05)
Gemini 2.5 Pro (06-05) - official image

Introduction

Google DeepMind has officially released the Gemini 2.5 Pro (06-05) preview, marking a significant leap in multimodal AI capabilities. Released on June 5, 2025, this model represents the current state-of-the-art in reasoning and coding performance. Unlike previous iterations, Gemini 2.5 Pro is explicitly designed for the agentic era, allowing it to plan, execute, and refine complex tasks autonomously.

The release comes amidst a competitive landscape where reasoning benchmarks are the primary metric for success. While Google has noted that this is currently a 'preview' version, the performance metrics suggest it outperforms earlier models by meaningful margins. However, developers should note recent concerns regarding the safety report released weeks after launch, which an AI governance expert described as 'meager' and 'worrisome' due to the lack of key safety evaluation results in the initial model card.

For engineering teams, this release signals a shift towards more robust, long-context interactions. The model is built to handle intricate workflows where context retention and logical deduction are critical. It is not open source, but access via API and Vertex AI is available for immediate integration.

  • Release Date: 2025-06-05
  • Status: Preview (Non-Open Source)
  • Focus: Agentic Reasoning & Multimodal

Key Features & Architecture

The architecture of Gemini 2.5 Pro is optimized for efficiency and depth. It features a massive 1M token context window, enabling the processing of entire codebases, long-form video, or extensive documentation in a single pass. This capability is crucial for RAG (Retrieval-Augmented Generation) systems that require high-fidelity memory retention.

A standout feature is the 'thinking' capability, which allows the model to reason through responses with enhanced accuracy. This internal monologue process improves nuanced context handling, reducing hallucinations in complex mathematical or logical tasks. The model supports multimodal understanding across text, image, video, and audio, making it a versatile tool for enterprise applications.

Developers have specifically noted improved coding performance in this preview version, often referred to as the 'I/O edition'. This update focuses on stronger coding capabilities, allowing for more effective code generation, debugging, and refactoring within the model's context window.

  • Context Window: 1,000,000 tokens
  • Multimodal: Text, Image, Video, Audio
  • Thinking Mode: Enabled for reasoning tasks
  • Architecture: MoE (Mixture of Experts)

Performance & Benchmarks

Gemini 2.5 Pro achieves SoTA performance on frontier coding and reasoning benchmarks. While specific ARC-AGI-2 scores vary by version, this model is reported to score significantly higher than its predecessors. The 'thinking' preview feature leads common benchmarks by meaningful margins, showcasing strong reasoning and code capabilities that rival top-tier competitors.

In terms of raw throughput, the model is designed for the agentic era. It can handle long chains of thought without degradation in quality. Benchmarks indicate superior performance in HumanEval and SWE-bench compared to the 2.0 Flash models. The model's ability to execute code within its context window further solidifies its position as a development tool rather than just a chatbot.

Safety evaluations remain a point of contention. While the model performs well on technical tasks, the lack of detailed technical reports for key safety evaluations in the initial release is a concern for enterprise adoption. Google states a more detailed technical report will be published when the model is 'fully released'.

  • Reasoning: SoTA on frontier benchmarks
  • Coding: Enhanced execution and debugging
  • Context Retention: High fidelity over 1M tokens
  • Safety: Pending full release report

API Pricing

Access to Gemini 2.5 Pro (06-05) is available through Google Cloud Vertex AI. As a preview model, pricing may fluctuate, but standard rates for this tier are competitive. The input cost is approximately $1.50 per million tokens, while the output cost is approximately $6.00 per million tokens. These rates reflect the high computational cost of the 1M token context window.

For developers, the value proposition lies in the efficiency of the 1M token window, which reduces the need for complex chunking strategies compared to smaller context models. Free tier availability is limited to Vertex AI trial credits for new accounts, after which standard billing applies. The pricing is per million tokens, making it cost-effective for high-volume, long-context applications.

Comparing this to other high-end models, the input price is generally lower than specialized reasoning competitors, though the output price is higher due to the compute intensity required for the 'thinking' mode. This pricing structure is ideal for batch processing and long-context analysis tasks.

  • Input Price: $1.50 / 1M tokens
  • Output Price: $6.00 / 1M tokens
  • Free Tier: Vertex AI Trial Credits
  • Billing: Per million tokens

Comparison Table

When evaluating Gemini 2.5 Pro against competitors, the context window and reasoning capabilities are the primary differentiators. The following table compares Gemini 2.5 Pro with Gemini 2.0 Pro, GPT-4o, and Claude 3.5 Sonnet. This comparison highlights where Gemini 2.5 Pro excels, particularly in long-context and agentic workflows.

Developers should choose based on their specific needs. If the priority is raw reasoning over a massive context window, other models might suffice. However, for agentic tasks requiring deep context retention, Gemini 2.5 Pro is currently the strongest option available in the preview phase.

  • Best for: Long-context reasoning
  • Best for: General chat
  • Best for: Speed and coding
  • Best for: Multimodal analysis

Use Cases

Gemini 2.5 Pro is best suited for applications requiring deep understanding of large datasets. In the coding domain, it can serve as a primary pair programmer, capable of understanding entire project structures within its 1M token window. This makes it ideal for refactoring legacy codebases or generating documentation for complex systems.

For enterprise RAG systems, the model's long context window eliminates the need for aggressive summarization. Legal and financial teams can upload entire contracts or reports for analysis without losing critical details. The multimodal capabilities also extend to video analysis, allowing for automated content moderation or technical video review.

Agentic workflows are the primary use case. The model can be tasked with multi-step reasoning problems, such as debugging a distributed system or analyzing financial trends across multiple data sources. The 'thinking' mode ensures that the steps taken are logical and verifiable.

  • Coding: Legacy refactoring and debugging
  • RAG: Long document analysis
  • Agents: Multi-step reasoning tasks
  • Multimodal: Video and audio analysis

Getting Started

To access Gemini 2.5 Pro (06-05), developers should use the Google Cloud Vertex AI API. The model endpoint is available for preview, and SDK support is provided for Python, Node.js, and Go. Authentication is handled via standard Google Cloud credentials, ensuring secure integration with existing infrastructure.

For immediate experimentation, the Google AI Studio allows for direct API calls without setting up a full Vertex AI project. Documentation is available on the Google Developers Blog and the DeepMind research page. It is recommended to start with small context loads to test the 'thinking' mode before scaling to full 1M token requests.

Integrating the model into a production environment requires careful monitoring of safety metrics. Given the recent safety report concerns, implementing guardrails and filtering layers is advised before exposing the model to end-users. The preview status means features may change, so version pinning is essential.

  • Platform: Google Cloud Vertex AI
  • SDKs: Python, Node.js, Go
  • Endpoint: Preview API
  • Docs: Google Developers Blog

Comparison

API Pricing β€” Input: $1.25 / Output: $10 / Context: 1,000,000 tokens


Sources

Gemini 2.5 Pro Preview 06-05: Pricing, Benchmarks & Performance

Google: Gemini 2.5 Pro - Performance Metrics

Gemini 2.5: Our newest Gemini model with thinking

Gemini 2.5 Pro Preview: even better coding performance