OpenAI dropped GPT-5.4 on March 5, 2026, and it is the most capable model the company has ever shipped. It scores 75% on OSWorld-Verified, above the human baseline of 72.4%, can natively control software like a human would, and handles up to 1 million tokens of context in a single API request. In under seven months, OpenAI has shipped five distinct versions of GPT-5, each with its own identity and price point.
If you’ve lost track of what separates GPT-5.0 from GPT-5.4, this guide covers every model in the family, release dates, context windows, benchmarks, pricing, and the key differences that actually matter. We also look ahead at what ChatGPT 6 is shaping up to be. If you’re newer to ChatGPT, our ChatGPT for Beginners guide is a good place to start before diving into model differences.
The Key Takeaways
- GPT-5.4 (March 5, 2026) is the current frontier model, with a 1M-token context window (API only), native computer use, and 57.7% on SWE-Bench Pro.
- GPT-5.3 Instant (March 3, 2026) is the cheapest model at ~$0.30/$1.20 per 1M tokens and delivers 26.8% fewer hallucinations than GPT-5.2 with web search.
- GPT-5.2 was the first model to score 90%+ on ARC-AGI-1 (Pro) and hit a perfect 100% on AIME 2025 math benchmarks.
- GPT-5.1 introduced adaptive reasoning — 2–3x faster on simple tasks — while keeping the same $1.25/$10 price as GPT-5.0.
- ChatGPT 6 has not been released; a mid-to-late 2026 launch is expected, with a shift toward persistent memory and autonomous agents.
All ChatGPT 5 Models at a Glance
| Model | Released | Context Window | API Pricing (Input / Output) | Best For |
|---|---|---|---|---|
| GPT-5.0 | Aug 7, 2025 | 400K / 128K out | $1.25 / $10 per 1M | General use, launch baseline |
| GPT-5.1 | Nov 13, 2025 | 400K (272K in) | $1.25 / $10 per 1M | Adaptive speed, conversational tasks |
| GPT-5.2 | Dec 11, 2025 | 400K (272K in) | $1.75 / $14 per 1M | Deep reasoning, coding, research |
| GPT-5.3 Instant | Mar 3, 2026 | 400K | ~$0.30 / ~$1.20 per 1M | Everyday writing, cost-sensitive use |
| GPT-5.4 | Mar 5, 2026 | 1M (API only) | $2.50 / $15 per 1M | Agentic work, professional tasks |
GPT-5.0: The Foundation
Released: August 7, 2025
The original GPT-5 arrived as a meaningful jump over GPT-4o, not just in raw benchmarks, but in architecture. According to OpenAI’s launch announcement, it was built as a unified system with a fast base model for everyday queries and a deeper reasoning layer (GPT-5 Thinking) that activates automatically when the query demands it. A real-time router decides which to use based on complexity, tool needs, and context, so you don’t have to manage it manually.
GPT-5.0 Benchmarks
- AIME 2025 (math): 94.6% without tools; 100% with Python tools (Pro)
- GPQA Diamond (PhD-level science): 89.4%
- SWE-bench Verified (coding): 74.9%
- Aider Polyglot (real-world coding): 88%
- Humanity’s Last Exam: 42%
- Hallucinations: under 1% on open-source prompts; 1.6% on hard medical cases
GPT-5.0 shipped with a 400K input / 128K output context window and was available to all ChatGPT users, with Pro subscribers getting extended reasoning access.
API pricing: $1.25 per 1M input tokens / $10 per 1M output tokens
What Changed vs. GPT-4o
GPT-5 was roughly 45% less likely to hallucinate than GPT-4o with web search enabled. The unified routing eliminated the need to manually switch between chat and reasoning modes — a friction point many users had with the o-series models.
GPT-5.1: Faster Without Being Dumber
Released: November 13, 2025
GPT-5.1 was not a capability leap — it was an efficiency upgrade. The headline feature was adaptive reasoning: the model dynamically allocates compute based on query complexity. Ask it something simple and it answers 2–3x faster than the standard model. Ask it something complex and it switches to full reasoning mode. OpenAI also tuned the tone to be warmer and more conversational, dropping some of the rigid formality that made GPT-5.0 occasionally feel stiff.
It used 30% fewer thinking tokens than its Codex variant while maintaining near-identical benchmark scores on most tasks.
GPT-5.1 Benchmarks
- AIME 2025: 94% (marginally lower than GPT-5.0’s 94.6%)
- GPQA Diamond: 87%
- MMMU (multimodal): 85.4%
- SWE-Bench Verified: 76.3% (Codex-Max variant: 77.9%)
Context window: 400K tokens (272K input / 128K output)
API pricing: $1.25 per 1M input / $10 per 1M output — same as GPT-5.0
GPT-5.1 Variants
OpenAI shipped a full family with GPT-5.1: Instant (fast), Thinking (deep reasoning), Auto (routing), Pro (research-grade), plus three Codex variants — standard Codex, Codex-Mini for lightweight tasks, and Codex-Max for agentic coding tasks lasting 24+ hours.
What GPT-5.1 Got Right
The reasoning_effort parameter (none / low / medium / high) gave developers fine-grained control over how much compute to spend per request — a practical tool for balancing cost and quality in production.
GPT-5.2: The Reasoning Milestone
Released: December 11, 2025
GPT-5.2 was the model that set new goalposts. It was the first AI to score 90%+ on ARC-AGI-1 — a benchmark designed specifically to resist pattern-matching and test genuine reasoning. It also hit a perfect 100% on AIME 2025 math problems. These weren’t marginal gains; they crossed thresholds that had defined the frontier for years.
OpenAI introduced a three-tier architecture with GPT-5.2:
- GPT-5.2 Instant — optimized for throughput: customer support, content generation, translation
- GPT-5.2 Thinking — configurable reasoning depth with Light/Medium/Heavy/xhigh settings, letting you trade latency for accuracy on a per-request basis
- GPT-5.2 Pro — maximum compute, up to 30 minutes of sustained processing for the most demanding tasks
GPT-5.2 Benchmarks
| Benchmark | Score |
|---|---|
| AIME 2025 | 100% |
| GPQA Diamond (Pro) | 93.2% |
| GPQA Diamond (Thinking) | 92.4% |
| SWE-Bench Verified | 80.0% |
| SWE-Bench Pro | 55.6% |
| ARC-AGI-1 (Pro) | 90%+ |
| ARC-AGI-2 (Pro) | 54.2% |
| FrontierMath | 40.3% |
GPT-5.2 delivered 38% fewer errors than GPT-5.1, the largest reliability jump in the GPT-5 family so far.
Context window: 400K tokens (272K input / 128K output)
API pricing: $1.75 per 1M input / $14 per 1M output — a 40% price increase over GPT-5.1
An agentic coding variant, GPT-5.2-Codex, followed on January 14, 2026, purpose-built for planning and executing multi-step engineering tasks autonomously.
Note: GPT-5.2 Thinking is being retired on June 3, 2026. If you rely on it for analytical work, GPT-5.4 Thinking is the intended upgrade path.
GPT-5.3: The Everyday Upgrade
Released: March 3, 2026
GPT-5.3 Instant fixed something earlier models quietly got wrong: the tone. Previous GPT-5 versions had a tendency toward excessive caveats, unnecessary hedging, and what users called “cringe”, AI overcaution that made every answer feel like a legal disclaimer. GPT-5.3 dialed that back, producing more direct and natural responses.
It also delivered a meaningful accuracy improvement. With web search enabled, GPT-5.3 produces 26.8% fewer hallucinations than GPT-5.2 Instant. Without search, the improvement is still 19.7%. User-flagged errors dropped by 22.5%.
GPT-5.3 Specs
- Context window: 400K tokens
- HealthBench: 54.1% (slight dip from GPT-5.2’s 55.4%)
- Hallucinations: 26.8% fewer (with web search vs. GPT-5.2)
- API pricing: ~$0.30 per 1M input / ~$1.20 per 1M output
That pricing is a dramatic drop from GPT-5.2’s $1.75/$14 — GPT-5.3 Instant is positioned as a high-quality, low-cost everyday model, not a reasoning powerhouse.
GPT-5.3 Trade-offs
The anti-cringe update came with a trade-off: safety compliance on some categories declined. Graphic violence content filtering dropped from 85.2% to 78.1% compared to GPT-5.2 — a decision OpenAI appears to have made deliberately, moving some safety controls to the product layer rather than the model level.
The GPT-5.3-Codex variant (February 5, 2026) is worth noting separately. It carries a 1 million-token context window — the same as GPT-5.4, runs 25% faster than GPT-5.2-Codex, and is the best option for large-scale agentic coding before upgrading to GPT-5.4.
GPT-5.4: The Professional Frontier
Released: March 5, 2026
GPT-5.4 is the most capable ChatGPT model OpenAI has ever shipped. According to the official GPT-5.4 announcement, the two genuinely new capabilities that set it apart from every prior model are native computer use and a 1 million-token context window (via the API).
Computer use means GPT-5.4 can interact directly with software interfaces — navigating desktops, clicking UI elements, running commands, verifying output, and looping back to fix errors in a build-run-verify-fix cycle. On OSWorld-Verified, the benchmark for desktop computer use, it scores 75.0%, exceeding the measured human baseline of 72.4%.
GPT-5.4 Benchmarks
| Benchmark | GPT-5.4 | GPT-5.2 |
|---|---|---|
| OSWorld-Verified (computer use) | 75.0% | 47.3% |
| GDPval (knowledge work, 44 professions) | 83.0% | 70.9% |
| SWE-Bench Pro (coding) | 57.7% | 55.6% |
| GPQA (science) | 92.8% | 93.2% |
| ARC-AGI v2 | 73.3% | 54.2% |
| FrontierMath | 47.6% | 40.3% |
| MMMU-Pro (multimodal) | 81.2% | — |
| BrowseComp (web research) | 82.7% | — |
| Spreadsheet tasks | 87.3% | — |
| Hallucinations vs. GPT-5.2 | 33% fewer (individual claim errors) | — |
GPT-5.4 Key Features
1 million-token context (API only). The full 1M window is available via the API. ChatGPT Plus, Team, and Pro subscribers in the chat interface do not get the expanded context. Worth noting: OpenAI charges double the standard rate once input exceeds 272K tokens in a single request.
Tool search. Instead of loading all tool definitions upfront, GPT-5.4 can retrieve tool definitions on demand. This cuts token usage by 47% in tool-heavy workflows — a practical cost saving for anyone building agents.
Full-resolution vision. GPT-5.4 processes images up to 10.24 million pixels, making it viable for detailed medical imaging, architectural plans, and high-res document analysis.
Compaction training. The model is trained to compress long agent trajectories while preserving key context — useful for multi-day autonomous workflows.
GPT-5.4 Pricing
| Tier | Input per 1M | Output per 1M |
|---|---|---|
| Standard | $2.50 | $15.00 |
| Batch / async | $1.25 | $7.50 |
| Priority | $5.00 | $30.00 |
| GPT-5.4 Pro | $30.00 | $180.00 |
GPT-5.4 Pro is priced at a significant premium. For context, Anthropic’s Claude Opus 4.6 costs $5 per 1M input / $25 per 1M output — GPT-5.4 Pro is six times the input price.
GPT-5.4 ships in three variants: Standard (everyday professional use), Thinking (deep multi-step reasoning, available to Plus/Team/Pro subscribers), and Pro (maximum performance).
Which ChatGPT Model Should You Use?
| Task | Best Model | Why |
|---|---|---|
| Everyday writing, email, summaries | GPT-5.3 Instant | Cheapest, fewer hallucinations, clean tone |
| High-volume API (cost matters) | GPT-5.3 Instant | ~$0.30/$1.20 per 1M — lowest in the family |
| Deep analytical reasoning | GPT-5.4 Thinking | Best reasoning in the family |
| Complex coding, bug fixing | GPT-5.4 or GPT-5.3-Codex | Highest SWE-Bench scores |
| Agentic, multi-step automation | GPT-5.4 | Computer use + 1M context |
| Computer use / desktop automation | GPT-5.4 only | No other model has this |
| Long document analysis (1M+ tokens) | GPT-5.4 only (API) | Only model with 1M context |
| Enterprise / maximum accuracy | GPT-5.4 Pro | Highest performance, premium price |
| Affordable reasoning (until June 2026) | GPT-5.2 Thinking | Retires June 3, 2026 |
For a broader comparison including Claude, Gemini, and other leading models, see our best AI models of 2026 guide. You can also check our February 2026 AI rankings to see how GPT-5.2 stacked up before GPT-5.3 and 5.4 arrived. And if you want to get more out of whichever model you land on, our top ChatGPT hacks for 2026 covers the features most people miss.
Looking Ahead: What Is ChatGPT 6?
As of March 2026, ChatGPT 6 has not been released. What we know is based on confirmed statements from Sam Altman and corroborating signals from OpenAI’s research and infrastructure investments.
Altman has confirmed that GPT-6 will arrive sooner than the 2.5-year gap between GPT-4 and GPT-5. A mid-to-late 2026 launch is the most widely cited estimate.
What GPT-6 Is Expected to Bring
Persistent memory. Rather than starting fresh with every conversation, GPT-6 is expected to maintain context across weeks or months. It will reference past discussions, preferences, and your working style, making it behave less like a search engine and more like a collaborator that actually knows you.
True autonomy. GPT-5 improved reasoning and reduced errors. GPT-6’s defining shift is expected to be agency, executing multi-step real-world tasks across browsers, apps, and enterprise systems without constant human check-ins. GPT-5.4 introduced computer use as a foundation; GPT-6 is expected to make that the core mode of operation.
Customizable behavior. Altman has indicated that users will get more control over how GPT-6 responds, including on opinion-sensitive topics. Disagreeing with the model’s default stance should become something you can actually adjust.
Potential self-learning. There is unconfirmed speculation around persistent weight updates, where the model learns across sessions rather than only within them. OpenAI has not confirmed this is part of GPT-6’s design.
We cover all confirmed details, expected features, and the latest leaks in our full guide on everything we know about ChatGPT 6.
Benchmark Progression: GPT-5.0 to GPT-5.4
| Benchmark | GPT-5.0 | GPT-5.1 | GPT-5.2 | GPT-5.3 | GPT-5.4 |
|---|---|---|---|---|---|
| AIME 2025 (math) | 94.6% | 94% | 100% | — | 100% |
| GPQA Diamond | 89.4% | ~87% | 93.2% | — | 92.8% |
| SWE-Bench Verified | 74.9% | 76.3% | 80.0% | — | ~52.8%* |
| SWE-Bench Pro | — | — | 55.6% | — | 57.7% |
| ARC-AGI-1 | — | — | 86.2% | — | 93.7% |
| ARC-AGI-2 | — | 17.6% | 54.2% | — | 73.3% |
| FrontierMath (T1–3) | 26.6% | — | 40.3% | — | 47.6% |
| FrontierMath (T4) | — | — | — | — | 27.1% |
| MMMU | 84.2% | 85.4% | — | — | 84.2% |
| MMMU-Pro | — | — | 86.5% | — | 81.2% |
| HealthBench Hard | 46.2% | — | 55.4% | 54.1% | 62.6% |
| GDPval (knowledge work) | — | — | 70.9% | — | 83.0% |
| OSWorld (computer use) | — | — | 47.3% | — | 75.0% |
| BrowseComp | — | — | 65.8% | — | 82.7% |
| Terminal-Bench 2.0 | — | 58.1% | 64.9% | — | 75.1% |
| Spreadsheet modeling | — | — | 68.4% | — | 87.3% |
| HLE (Humanity’s Last Exam) | 42% | — | — | — | 52.1% |
| LiveCodeBench | — | — | — | — | 72.5% |
| Aider Polyglot | 88% | — | — | — | — |
| Hallucination reduction | -45% vs GPT-4o | — | -38% vs 5.1 | -26.8% vs 5.2 | -33% vs 5.2 |
* GPT-5.4 SWE-Bench Verified score is from a non-thinking variant; OpenAI has not published an official figure for this benchmark.
Conclusion
The GPT-5 family has moved fast. In seven months, OpenAI went from a capable general-purpose model to one that operates software like a human, holds an entire codebase in context, and outperforms industry professionals on knowledge work benchmarks across 44 occupations.
For most users, GPT-5.3 Instant is the clear everyday choice, it is the cheapest model in the family, hallucinates less than any prior version, and no longer feels like it was written by a compliance department. For professional workflows that need computer use, 1M-token context, or maximum reasoning depth, GPT-5.4 justifies the price. And if you are building anything that will still be running in 12 months, keep an eye on ChatGPT 6, the shift from tool to autonomous collaborator is coming.
FAQ
What is the latest ChatGPT model in 2026?
GPT-5.4, released March 5, 2026. It features a 1 million-token context window (via API), native computer use, and the highest benchmark scores of any OpenAI model to date.
What is the difference between GPT-5.3 and GPT-5.4?
GPT-5.4 adds native computer use, a 1 million-token context window, and improved reasoning efficiency. GPT-5.3 Instant is faster and much cheaper (~$0.30/$1.20 vs $2.50/$15 per 1M tokens), making it better for everyday tasks where you don’t need those advanced features.
Is GPT-5.4 worth the price?
For agentic workflows, computer use, or large-context tasks — yes. For standard writing, chat, and reasoning, GPT-5.3 Instant delivers strong results at a fraction of the cost. The best approach is to use GPT-5.3 Instant as your default and switch to GPT-5.4 when the task genuinely needs it.
When is ChatGPT 6 coming out?
No official date. Mid-to-late 2026 is the most common estimate. Sam Altman has confirmed it will come sooner than the gap between GPT-4 and GPT-5. See our ChatGPT 6 guide for the latest confirmed details.
Which ChatGPT model is best for coding?
GPT-5.4 for agentic, multi-step engineering tasks. GPT-5.3-Codex for large-codebase work (1M context) at a lower cost. GPT-5.2-Codex remains a solid option if you need the previous generation’s pricing through mid-2026.




