Gemini 3.1 Pro Is Here: Benchmarks, Pricing, and How It Stacks Up Against Claude and GPT

Google released Gemini 3.1 Pro on February 19, 2026, and its benchmark numbers are hard to ignore. The model scored 77.1% on ARC-AGI-2, a test specifically designed to prevent AI from relying on memorised answers — it forces genuine reasoning on problems the model has never encountered before. That is more than double the 31.1% scored by Gemini 3 Pro when it launched just three months ago. Google has since launched Gemini 3.5, whose Flash model now beats Gemini 3.1 Pro on coding and agentic benchmarks.

This article covers everything you need to know about Gemini 3.1 Pro, including what changed from the previous version, how it performs against Claude Opus 4.6 and GPT-5.2, what it costs, and where you can access it right now. If you’re deciding whether it belongs in your AI workflow, the data below makes the comparison clear.

Table of Contents hide

What Is Gemini 3.1 Pro?

What Changed from Gemini 3 Pro?

Three Thinking Levels: Low, Medium, High

Gemini 3.1 Pro Benchmarks: How It Compares to Claude and GPT

Gemini 3.1 Pro Pricing

Where to Access Gemini 3.1 Pro

Music Generation: Lyria 3 Comes to the Gemini App

Is Gemini 3.1 Pro Worth Using?

Conclusion

FAQ

The Key Takeaways

Google released Gemini 3.1 Pro on February 19, 2026, currently available as a preview

ARC-AGI-2 score: 77.1%, a 2.5x improvement over Gemini 3 Pro’s 31.1%

Pricing starts at $2 per million input tokens, over six times cheaper than Claude Opus 4.6

Supports a 1 million token context window with up to 75% cost savings via context caching

Available now in the Gemini app, Google AI Studio, Vertex AI, and GitHub Copilot

What Is Gemini 3.1 Pro?

Gemini 3.1 Pro is Google’s updated flagship AI model and the most capable model in the Gemini 3 family. The .1 increment is new for Google; previous generations moved straight from the base model to a .5 update, which signals this is a meaningful capability jump rather than a minor patch.

Google targets it at complex tasks, not everyday chat. It handles text, images, audio, video, and entire code repositories in a single session, with a 1 million token input context window and up to 64,000 tokens of output per call. Three configurable thinking levels (Low, Medium, and High) let you control the trade-off between reasoning depth, speed, and cost depending on what the task demands.

Gemini 3.1 Pro is here. Hitting 77.1% on ARC-AGI-2, it’s a step forward in core reasoning (more than 2x 3 Pro).

With a more capable baseline, it’s great for super complex tasks like visualizing difficult concepts, synthesizing data into a single view, or bringing creative… pic.twitter.com/aEs0LiylQZ
— Sundar Pichai (@sundarpichai) February 19, 2026

What Changed from Gemini 3 Pro?

The reasoning improvement is the headline. Gemini 3.1 Pro scored 77.1% on ARC-AGI-2 versus 31.1% for Gemini 3 Pro. ARC-AGI-2 is designed to test whether a model can solve logic patterns it has genuinely never seen — it deliberately avoids anything a model could memorise from training data. A 2.5x gain on that benchmark in one update is significant.

Coding performance improved too. The model scored 80.6% on SWE-Bench Verified and 68.5% on Terminal-Bench 2.0, both placing first among publicly ranked models at launch. Google describes it as an “agentic coding model” built for edit-then-test workflows that require fewer tool calls per completed task.

Three Thinking Levels: Low, Medium, High

Gemini 3.1 Pro introduces selectable thinking levels, a feature designed for cost optimisation at scale. You choose how much reasoning budget the model applies based on what you’re doing. A quick document summary doesn’t need the same processing depth as designing a data pipeline or debugging a complex codebase. Google has not yet published detailed performance breakdowns per level, but the system is built to prevent you from paying for heavy reasoning when you don’t need it.

Gemini 3.1 Pro Benchmarks: How It Compares to Claude and GPT

Gemini 3.1 Pro leads on most major benchmarks, though the picture is nuanced depending on task type. According to Digital Applied’s benchmark tracker, the model currently holds first place on 12 out of 18 tracked metrics.

Benchmark	Gemini 3.1 Pro	Claude Opus 4.6	Claude Sonnet 4.6	GPT-5.2
ARC-AGI-2	77.1%	68.8%	60.4%	52.9%
GPQA Diamond	94.3%	91.3%	74.1%	93.2%
SWE-Bench Verified	80.6%	80.8%	79.6%	80.0%
Terminal-Bench 2.0	68.5%	65.4%	59.1%	60.0%
GDPval-AA Elo	1,317	1,606	1,633	1,462
Context window	1M tokens	1M tokens (beta)	1M tokens (beta)	400K tokens
Input price (per 1M tokens)	$2	$5	$3	$1.75
Output price (per 1M tokens)	$12	$25	$15	$14

*ARC-AGI-2 score for Claude Opus 4.6 is estimated; Gemini 3.1 Pro leads by over 8 percentage points per Google. GPT-5.2 benchmark data not publicly confirmed at time of writing.

Gemini 3.1 Pro leads Claude Opus 4.6 on ARC-AGI-2 by more than 8 percentage points and on GPQA Diamond by 3 points, at over six times lower cost on input tokens.

The one area where Claude Opus 4.6 holds a clear lead is GDPval-AA, a benchmark measuring performance on complex expert-level office work. Claude Opus 4.6 scores 1,633 Elo versus Gemini 3.1 Pro’s 1,317. If your work involves high-stakes research synthesis, multi-step legal analysis, or complex long-form writing, that gap is real and worth factoring in.

Gemini 3.1 Pro Pricing

Gemini 3.1 Pro pricing is the same as Gemini 3 Pro. For prompts under 200,000 tokens, you pay $2 per million input tokens and $12 per million output tokens. For longer sessions in the 200,000 to 1,000,000 token range, that moves to $4 per million input and $18 per million output.

Context caching cuts those costs by up to 75% for repeated content across long sessions, useful for anyone running the model against large documents or codebases regularly.

Tier	Input (per 1M tokens)	Output (per 1M tokens)
Under 200K tokens	$2	$12
200K to 1M tokens	$4	$18
With context caching	up to 75% off	up to 75% off

Claude Opus 4.6 costs $15 per million input tokens and $75 per million output tokens, more than six times the input cost and more than six times the output cost for the same workload. For most use cases, Gemini 3.1 Pro is the significantly more cost-effective choice, unless your tasks fall into the expert-level category where Claude still leads.

Where to Access Gemini 3.1 Pro

Gemini 3.1 Pro launched in preview on February 19, 2026. It is available across all of these platforms today.

Gemini app, available for Google AI Pro and Ultra subscribers
NotebookLM, for Google AI Pro and Ultra subscribers
Google AI Studio, free developer access via the gemini-3.1-pro-preview model ID
Vertex AI, for enterprise deployment
GitHub Copilot, available to Pro, Pro+, Business, and Enterprise plan users
Gemini CLI and Android Studio, for developer tooling

A second variant, gemini-3.1-pro-preview-customtools, is available for agentic workflows where precise function-call performance matters. Simon Willison’s first-day testing also noted significant latency at launch; one query took over 100 seconds, and Google is expected to resolve this as the preview scales.

General availability is expected soon. For a broader look at where Gemini 3.1 Pro sits in the current AI field, see our February 2026 AI model rankings.

Music Generation: Lyria 3 Comes to the Gemini App

Alongside the 3.1 Pro launch, Google added music generation to the Gemini app via Lyria 3, its most capable music model to date. You describe a track in plain text or upload an image, and Gemini generates a 30-second song complete with AI-written lyrics and cover art. The rollout began on February 18, 2026 on desktop, with mobile support following shortly after.

Lyria 3 is a meaningfully better model than its predecessor. It handles more complex track structures, gives you greater control over musical style and vocals, and generates lyrics as part of the output rather than requiring a separate step. Every track is watermarked using Google’s SynthID technology, so AI-generated music stays identifiable even after editing or re-encoding.

The feature supports eight languages at launch: English, German, Spanish, French, Hindi, Japanese, Korean, and Portuguese, with more planned. It is available to all Gemini users aged 18 and over. The current limit is 30-second tracks, suited for social media clips, demos, and short-form content. Google has not confirmed when longer track support will arrive.

This is a Gemini app feature powered by Lyria 3, not a capability of the 3.1 Pro model itself. You do not need a paid Gemini plan to access it, though availability may vary by region during the initial rollout.

Is Gemini 3.1 Pro Worth Using?

For developers and technically focused users, the case is strong. Gemini 3.1 Pro leads most benchmarks at a price that makes Claude Opus 4.6 hard to justify for the same tasks. The GitHub Copilot integration means you can access it inside your existing editor without any workflow changes.

For non-technical users, the answer depends on what you actually do. On GDPval-AA expert office tasks, Claude Opus 4.6 still holds the lead with an Elo of 1,633 versus 1,317. For sophisticated analytical writing, research synthesis, and high-context reasoning tasks, that gap may matter.

The clearest cases for choosing Gemini 3.1 Pro are coding, novel reasoning problems, high-volume API usage where cost matters, and any workflow that benefits from a 1 million token context window. For context on where Claude sits right now, see our Claude Sonnet 4.6 release breakdown.

Conclusion

Gemini 3.1 Pro is a genuine step up from Gemini 3 Pro. Its ARC-AGI-2 score and coding benchmarks place it at the top of the current leaderboard for most tasks, and at $2 per million input tokens, it is the most cost-effective flagship model available right now. The only caveat is expert-level tasks, where Claude Opus 4.6 still leads by a clear margin.

If you haven’t tried it, Google AI Studio gives you free access today via the gemini-3.1-pro-preview model ID. If you’re on a GitHub Copilot plan, it’s already available in your editor.

FAQ

Is Gemini 3.1 Pro better than Claude Opus 4.6?

On most benchmarks, yes. Gemini 3.1 Pro leads Claude Opus 4.6 on ARC-AGI-2 and GPQA Diamond at over six times lower input cost. Claude Opus 4.6 still leads on GDPval-AA, a benchmark for expert-level office tasks. The better choice depends on your specific use case.

How much does Gemini 3.1 Pro cost?

$2 per million input tokens and $12 per million output tokens for prompts under 200,000 tokens. Context caching can reduce costs by up to 75% for repeated content within long sessions.

How do I access Gemini 3.1 Pro?

It is available in the Gemini app, Google AI Studio (free, via gemini-3.1-pro-preview), Vertex AI, NotebookLM, Gemini CLI, and GitHub Copilot (Pro, Pro+, Business, and Enterprise plans). For what each NotebookLM tier costs, see our NotebookLM pricing tiers guide. Students can also check whether Google AI Pro is still free for students, since the 12-month student offer has now closed.

What is the difference between Gemini 3 Pro and Gemini 3.1 Pro?

The main improvement is reasoning. Gemini 3.1 Pro scored 77.1% on ARC-AGI-2 versus 31.1% for Gemini 3 Pro, a 2.5x jump. Coding performance also improved, with the model leading on SWE-Bench Verified and Terminal-Bench 2.0 at launch.