OpenAI dropped GPT-5.5 on April 23, 2026, and it is now the most capable ChatGPT model the company ships. GPT-5.5 scores 93.6% on GPQA Diamond, 82.7% on Terminal-Bench 2.0, and 78.7% on OSWorld-Verified, all ahead of GPT-5.4, while its Pro variant pushes BrowseComp to 90.1% and FrontierMath Tier 4 to 39.6%.
In under nine months, OpenAI has shipped six distinct versions of GPT-5, each with its own identity and price point. If you have lost track of what separates GPT-5.0 from GPT-5.5, this guide covers every model in the family, release dates, context windows, benchmarks, pricing, and the key differences that actually matter. We also look ahead at what ChatGPT 6 is shaping up to be. If you are newer to ChatGPT, our ChatGPT for Beginners guide is a good place to start before you compare model differences.
The Key Takeaways
- GPT-5.5 (April 23, 2026) is the current frontier model. It scores 93.6% on GPQA Diamond and 78.7% on OSWorld-Verified, both ahead of GPT-5.4, and ships in two variants, GPT-5.5 standard and GPT-5.5 Pro.
- GPT-5.4 (March 5, 2026) is the previous frontier model. It introduced a 1M-token context window (API only) and native computer use, scoring 75.0% on OSWorld-Verified and 57.7% on SWE-Bench Pro.
- GPT-5.3 Instant (March 3, 2026) is still the cheapest capable model at ~$0.30 / $1.20 per 1M tokens and delivers 26.8% fewer hallucinations than GPT-5.2 with web search on.
- GPT-5.2 was the first model to score 90%+ on ARC-AGI-1 (Pro) and hit a perfect 100% on AIME 2025 math.
- GPT-5.1 introduced adaptive reasoning, 2 to 3x faster on simple tasks, while keeping the same $1.25 / $10 price as GPT-5.0.
- Pricing doubled at the standard tier. GPT-5.5 API is $5 / $30 per 1M tokens, up from GPT-5.4’s $2.50 / $15 and GPT-5.0’s $1.25 / $10.
- ChatGPT 6 has not shipped. A mid-to-late 2026 launch is still the expected window, with a shift toward persistent memory and autonomous agents.
All ChatGPT 5 Models at a Glance
| Model | Released | Context Window | API Pricing (Input / Output) | Best For |
|---|---|---|---|---|
| GPT-5.0 | Aug 7, 2025 | 400K / 128K out | $1.25 / $10 per 1M | General use, launch baseline |
| GPT-5.1 | Nov 13, 2025 | 400K (272K in) | $1.25 / $10 per 1M | Adaptive speed, conversational tasks |
| GPT-5.2 | Dec 11, 2025 | 400K (272K in) | $1.75 / $14 per 1M | Deep reasoning, coding, research |
| GPT-5.3 Instant | Mar 3, 2026 | 400K | ~$0.30 / ~$1.20 per 1M | Everyday writing, cost-sensitive use |
| GPT-5.4 | Mar 5, 2026 | 1M (API only) | $2.50 / $15 per 1M | Agentic work, computer use |
| GPT-5.5 | Apr 23, 2026 | 1M (API only) | $5 / $30 per 1M (Pro: $30 / $180) | Agentic coding, knowledge work |
For how these API prices map to ChatGPT subscription tiers (Free, Go, Plus, Pro, Business, Enterprise), see our ChatGPT pricing guide.
GPT-5.0: The Foundation
Released: August 7, 2025
The original GPT-5 arrived as a meaningful jump over GPT-4o, not just in raw benchmarks, but in architecture. According to OpenAI’s launch announcement, it was built as a unified system with a fast base model for everyday queries and a deeper reasoning layer (GPT-5 Thinking) that activates automatically when the query demands it. A real-time router decides which to use based on complexity, tool needs, and context, so you do not have to manage it manually.
GPT-5.0 Benchmarks
- AIME 2025 (math): 94.6% without tools; 100% with Python tools (Pro)
- GPQA Diamond (PhD-level science): 89.4%
- SWE-bench Verified (coding): 74.9%
- Aider Polyglot (real-world coding): 88%
- Humanity’s Last Exam: 42%
- Hallucinations: under 1% on open-source prompts; 1.6% on hard medical cases
GPT-5.0 shipped with a 400K input / 128K output context window and was available to all ChatGPT users, with Pro subscribers getting extended reasoning access.
API pricing: $1.25 per 1M input tokens / $10 per 1M output tokens
What Changed vs. GPT-4o
GPT-5 was roughly 45% less likely to hallucinate than GPT-4o with web search enabled. The unified routing eliminated the need to manually switch between chat and reasoning modes, a friction point many users had with the o-series models.
GPT-5.1: Faster Without Being Dumber
Released: November 13, 2025
Per OpenAI’s GPT-5.1 announcement, GPT-5.1 was not a capability leap, it was an efficiency upgrade. The headline feature was adaptive reasoning, where the model dynamically allocates compute based on query complexity. Ask it something simple and it answers 2 to 3x faster than the standard model. Ask it something complex and it switches to full reasoning mode.
OpenAI also tuned the tone to be warmer and more conversational, dropping some of the rigid formality that made GPT-5.0 occasionally feel stiff. It used 30% fewer thinking tokens than its Codex variant while maintaining near-identical benchmark scores on most tasks.
GPT-5.1 Benchmarks
- AIME 2025: 94% (marginally lower than GPT-5.0’s 94.6%)
- GPQA Diamond: 87%
- MMMU (multimodal): 85.4%
- SWE-Bench Verified: 76.3% (Codex-Max variant: 77.9%)
- Context window: 400K tokens (272K input / 128K output)
- API pricing: $1.25 per 1M input / $10 per 1M output, same as GPT-5.0
GPT-5.1 Variants
OpenAI shipped a full family with GPT-5.1, Instant (fast), Thinking (deep reasoning), Auto (routing), and Pro (research-grade), plus three Codex variants, standard Codex, Codex-Mini for lightweight tasks, and Codex-Max for agentic coding tasks lasting 24+ hours.
What GPT-5.1 Got Right
The reasoning_effort parameter (none / low / medium / high) gave developers fine-grained control over how much compute to spend per request, a practical tool for balancing cost and quality in production.
GPT-5.2: The Reasoning Milestone
Released: December 11, 2025
GPT-5.2 was the model that set new goalposts. It was the first AI to score 90%+ on ARC-AGI-1, a benchmark designed specifically to resist pattern-matching and test genuine reasoning. It also hit a perfect 100% on AIME 2025 math problems. These were not marginal gains; they crossed thresholds that had defined the frontier for years.
OpenAI introduced a three-tier architecture with GPT-5.2.
- GPT-5.2 Instant, optimized for throughput: customer support, content generation, translation
- GPT-5.2 Thinking, configurable reasoning depth with Light/Medium/Heavy/xhigh settings, letting you trade latency for accuracy on a per-request basis
- GPT-5.2 Pro, maximum compute, up to 30 minutes of sustained processing for the most demanding tasks
GPT-5.2 Benchmarks
| Benchmark | Score |
|---|---|
| AIME 2025 | 100% |
| GPQA Diamond (Pro) | 93.2% |
| GPQA Diamond (Thinking) | 92.4% |
| SWE-Bench Verified | 80.0% |
| SWE-Bench Pro | 55.6% |
| ARC-AGI-1 (Pro) | 90%+ |
| ARC-AGI-2 (Pro) | 54.2% |
| FrontierMath | 40.3% |
GPT-5.2 delivered 38% fewer errors than GPT-5.1, the largest reliability jump in the GPT-5 family so far.
- Context window: 400K tokens (272K input / 128K output)
- API pricing: $1.75 per 1M input / $14 per 1M output, a 40% price increase over GPT-5.1
An agentic coding variant, GPT-5.2-Codex, followed on January 14, 2026, purpose-built for planning and executing multi-step engineering tasks autonomously.
Note: GPT-5.2 Thinking is being retired on June 3, 2026. If you rely on it for analytical work, GPT-5.4 Thinking or GPT-5.5 is the intended upgrade path.
GPT-5.3: The Everyday Upgrade
Released: March 3, 2026
GPT-5.3 Instant fixed something earlier models quietly got wrong, the tone. Previous GPT-5 versions had a tendency toward excessive caveats, unnecessary hedging, and what users called “cringe,” AI overcaution that made every answer feel like a legal disclaimer. GPT-5.3 dialed that back, producing more direct and natural responses.
It also delivered a meaningful accuracy improvement. With web search enabled, GPT-5.3 produces 26.8% fewer hallucinations than GPT-5.2 Instant. Without search, the improvement is still 19.7%. User-flagged errors dropped by 22.5%.
GPT-5.3 Specs
- Context window: 400K tokens
- HealthBench: 54.1% (slight dip from GPT-5.2’s 55.4%)
- Hallucinations: 26.8% fewer (with web search vs. GPT-5.2)
- API pricing: ~$0.30 per 1M input / ~$1.20 per 1M output
That pricing is a dramatic drop from GPT-5.2’s $1.75 / $14. GPT-5.3 Instant is positioned as a high-quality, low-cost everyday model, not a reasoning powerhouse.
GPT-5.3 Trade-offs
The anti-cringe update came with a trade-off: safety compliance on some categories declined. Graphic violence content filtering dropped from 85.2% to 78.1% compared to GPT-5.2, a decision OpenAI appears to have made deliberately, moving some safety controls to the product layer rather than the model level.
The GPT-5.3-Codex variant (February 5, 2026) is worth noting separately. It carries a 1 million-token context window, the same as GPT-5.4, runs 25% faster than GPT-5.2-Codex, and is the best option for large-scale agentic coding at a lower cost than 5.4 or 5.5.
GPT-5.4: The Professional Frontier
Released: March 5, 2026
GPT-5.4 held the “most capable model OpenAI has ever shipped” title for six weeks until GPT-5.5 arrived in late April 2026. According to OpenAI’s official GPT-5.4 announcement, the two genuinely new capabilities that set it apart from every prior model were native computer use and a 1 million-token context window (via the API).
Computer use means GPT-5.4 can interact directly with software interfaces, navigating desktops, clicking UI elements, running commands, verifying output, and looping back to fix errors in a build-run-verify-fix cycle. On OSWorld-Verified, the benchmark for desktop computer use, it scores 75.0%, exceeding the measured human baseline of 72.4%.
GPT-5.4 Benchmarks
| Benchmark | GPT-5.4 | GPT-5.2 |
|---|---|---|
| OSWorld-Verified (computer use) | 75.0% | 47.3% |
| GDPval (knowledge work, 44 professions) | 83.0% | 70.9% |
| SWE-Bench Pro (coding) | 57.7% | 55.6% |
| GPQA (science) | 92.8% | 93.2% |
| ARC-AGI v2 | 73.3% | 54.2% |
| FrontierMath | 47.6% | 40.3% |
| MMMU-Pro (multimodal) | 81.2% | – |
| BrowseComp (web research) | 82.7% | – |
| Spreadsheet tasks | 87.3% | – |
| Hallucinations vs. GPT-5.2 | 33% fewer (individual claim errors) | – |
GPT-5.4 Key Features
1 million-token context (API only). The full 1M window is available via the API. ChatGPT Plus, Team, and Pro subscribers in the chat interface do not get the expanded context. Worth noting, OpenAI charges double the standard rate once input exceeds 272K tokens in a single request.
Tool search. Instead of loading all tool definitions upfront, GPT-5.4 can retrieve tool definitions on demand. This cuts token usage by 47% in tool-heavy workflows, a practical cost saving for anyone building agents.
Full-resolution vision. GPT-5.4 processes images up to 10.24 million pixels, making it viable for detailed medical imaging, architectural plans, and high-res document analysis.
Compaction training. The model is trained to compress long agent trajectories while preserving key context, useful for multi-day autonomous workflows.
GPT-5.4 Pricing
| Tier | Input per 1M | Output per 1M |
|---|---|---|
| Standard | $2.50 | $15.00 |
| Batch / async | $1.25 | $7.50 |
| Priority | $5.00 | $30.00 |
| GPT-5.4 Pro | $30.00 | $180.00 |
GPT-5.4 Pro is priced at a significant premium. For context, Anthropic’s Claude Opus 4.6 costs $5 per 1M input / $25 per 1M output, so GPT-5.4 Pro is six times the input price.
GPT-5.4 ships in three variants: Standard (everyday professional use), Thinking (deep multi-step reasoning, available to Plus/Team/Pro subscribers), and Pro (maximum performance).
GPT-5.5: The Current Frontier
Released: April 23, 2026
GPT-5.5 is the model OpenAI ships today as the new default for ChatGPT Plus, Pro, Business, and Enterprise. It arrived 49 days after GPT-5.4, making it the fastest flagship turnover in the GPT-5 family. Read OpenAI’s official GPT-5.5 announcement for the full technical brief, and our own GPT-5.5 launch coverage for the complete benchmark tables and real-world testing.
Two variants ship. GPT-5.5 standard is available to all paid ChatGPT tiers and Codex, and GPT-5.5 Pro is restricted to Pro, Business, and Enterprise tiers. OpenAI positions GPT-5.5 as a model “built specifically for real work and for powering agents,” with the largest efficiency gains going to agentic coding and long-horizon knowledge work.
GPT-5.5 Benchmarks
| Benchmark | GPT-5.5 | GPT-5.5 Pro |
|---|---|---|
| Terminal-Bench 2.0 | 82.7% | – |
| GDPval (knowledge work) | 84.9% | – |
| OSWorld-Verified (computer use) | 78.7% | – |
| SWE-Bench Pro (coding) | 58.6% | – |
| GPQA Diamond (science) | 93.6% | – |
| BrowseComp (web research) | 84.4% | 90.1% |
| FrontierMath Tier 4 | 35.4% | 39.6% |
| Humanity’s Last Exam (no tools) | 41.4% | 43.1% |
| CyberGym | 81.8% | – |
Versus GPT-5.4, the standout gains are OSWorld-Verified (+3.7 points), GPQA Diamond (+0.8), SWE-Bench Pro (+0.9), and Terminal-Bench 2.0 (+7.6). The numbers understate how the model feels to use, though. OpenAI reports that at NVIDIA, debugging cycles “dropped from days to hours” using GPT-5.5’s agentic workflow, which is the kind of real-world win that does not show up in a benchmark table.
GPT-5.5 Key Features
Agentic coding by default. GPT-5.5 ships tuned for multi-step engineering workflows. The same Codex tool that used to need supervision can now chain apply_patch, terminal commands, and verification steps over longer horizons without losing the thread.
Computer use, now production-grade. 78.7% on OSWorld-Verified puts GPT-5.5 firmly above the 72.4% human baseline. Desktop automation workflows that were flaky on 5.4 start to look viable on 5.5.
Lower hallucination rate. OpenAI reports meaningfully fewer hallucinations versus GPT-5.4 across standard knowledge-work tasks, though the company has not released a single headline figure.
API rolling out soon. At launch, GPT-5.5 went live inside ChatGPT and Codex. API access was announced as “coming very soon” with different safeguards than the ChatGPT rollout.
GPT-5.5 Pricing
| Tier | Input per 1M | Output per 1M |
|---|---|---|
| GPT-5.5 Standard | $5.00 | $30.00 |
| GPT-5.5 Pro | $30.00 | $180.00 |
Standard is exactly 2x the price of GPT-5.4 standard. GPT-5.5 Pro matches GPT-5.4 Pro’s $30 / $180, so if you already pay for Pro, the jump to 5.5 Pro is effectively free at the API level.
Which ChatGPT Model Should You Use?
| Task | Best Model | Why |
|---|---|---|
| Everyday writing, email, summaries | GPT-5.3 Instant | Cheapest, fewer hallucinations, clean tone |
| High-volume API (cost matters) | GPT-5.3 Instant | ~$0.30 / $1.20 per 1M, lowest in the family |
| Deep analytical reasoning | GPT-5.5 or GPT-5.5 Pro | Highest GPQA Diamond and FrontierMath in the family |
| Complex coding, bug fixing | GPT-5.5 (or GPT-5.5 Pro) | Highest SWE-Bench Pro and Terminal-Bench 2.0 scores |
| Agentic, multi-step automation | GPT-5.5 | Best computer use + agentic coding performance |
| Computer use / desktop automation | GPT-5.5 | 78.7% OSWorld-Verified, highest in the family |
| Long document analysis (1M+ tokens) | GPT-5.4 or GPT-5.5 (API) | Both offer the 1M context window |
| Enterprise / maximum accuracy | GPT-5.5 Pro | Highest BrowseComp and FrontierMath Tier 4 |
| Affordable reasoning (until June 2026) | GPT-5.2 Thinking | Retires June 3, 2026 |
For a broader comparison including Claude, Gemini, and other leading models, see our Best AI Models hub. For a head-to-head on OpenAI versus Anthropic specifically, check our Claude vs ChatGPT comparison. And if you want to get more out of whichever model you land on, our top ChatGPT hacks covers the features most people miss.
Looking Ahead: What Is ChatGPT 6?
As of April 2026, ChatGPT 6 has not been released. What we know is based on confirmed statements from Sam Altman and corroborating signals from OpenAI’s research and infrastructure investments. Altman has confirmed that GPT-6 will arrive sooner than the 2.5-year gap between GPT-4 and GPT-5. A mid-to-late 2026 launch is the most widely cited estimate.
What GPT-6 Is Expected to Bring
Persistent memory. Rather than starting fresh with every conversation, GPT-6 is expected to maintain context across weeks or months. It will reference past discussions, preferences, and your working style, making it behave less like a search engine and more like a collaborator that actually knows you.
True autonomy. GPT-5 improved reasoning and reduced errors. GPT-6’s defining shift is expected to be agency, executing multi-step real-world tasks across browsers, apps, and enterprise systems without constant human check-ins. GPT-5.4 introduced computer use as a foundation, GPT-5.5 made it production-grade, and GPT-6 is expected to make that the core mode of operation.
Customizable behavior. Altman has indicated that users will get more control over how GPT-6 responds, including on opinion-sensitive topics. Disagreeing with the model’s default stance should become something you can actually adjust.
Potential self-learning. There is unconfirmed speculation around persistent weight updates, where the model learns across sessions rather than only within them. OpenAI has not confirmed this is part of GPT-6’s design.
We cover all confirmed details, expected features, and the latest leaks in our full ChatGPT 6 guide.
Benchmark Progression: GPT-5.0 to GPT-5.5
| Benchmark | GPT-5.0 | GPT-5.1 | GPT-5.2 | GPT-5.3 | GPT-5.4 | GPT-5.5 |
|---|---|---|---|---|---|---|
| AIME 2025 (math) | 94.6% | 94% | 100% | – | 100% | – |
| GPQA Diamond | 89.4% | ~87% | 93.2% | – | 92.8% | 93.6% |
| SWE-Bench Verified | 74.9% | 76.3% | 80.0% | – | ~52.8%* | – |
| SWE-Bench Pro | – | – | 55.6% | – | 57.7% | 58.6% |
| ARC-AGI-1 | – | – | 86.2% | – | 93.7% | – |
| ARC-AGI-2 | – | 17.6% | 54.2% | – | 73.3% | – |
| FrontierMath (T1–3) | 26.6% | – | 40.3% | – | 47.6% | – |
| FrontierMath (T4) | – | – | – | – | 27.1% | 35.4% (Pro 39.6%) |
| MMMU | 84.2% | 85.4% | – | – | 84.2% | – |
| MMMU-Pro | – | – | 86.5% | – | 81.2% | – |
| HealthBench Hard | 46.2% | – | 55.4% | 54.1% | 62.6% | – |
| GDPval (knowledge work) | – | – | 70.9% | – | 83.0% | 84.9% |
| OSWorld (computer use) | – | – | 47.3% | – | 75.0% | 78.7% |
| BrowseComp | – | – | 65.8% | – | 82.7% | 84.4% (Pro 90.1%) |
| Terminal-Bench 2.0 | – | 58.1% | 64.9% | – | 75.1% | 82.7% |
| Spreadsheet modeling | – | – | 68.4% | – | 87.3% | – |
| HLE (Humanity’s Last Exam) | 42% | – | – | – | 52.1% | 41.4% (no tools) |
| LiveCodeBench | – | – | – | – | 72.5% | – |
| Aider Polyglot | 88% | – | – | – | – | – |
| CyberGym | – | – | – | – | – | 81.8% |
| Hallucination reduction | -45% vs GPT-4o | – | -38% vs 5.1 | -26.8% vs 5.2 | -33% vs 5.2 | lower than 5.4 (no single figure yet) |
GPT-5.4 SWE-Bench Verified score is from a non-thinking variant; OpenAI has not published an official figure for this benchmark. Humanity’s Last Exam numbers are not directly comparable, GPT-5.5’s 41.4% is the no-tools figure; GPT-5.4’s 52.1% is with tools.
Want Every GPT-5 Variant Plus Claude, Gemini, and Grok in One Mac App?
If you are paying for ChatGPT, Claude Pro, and Gemini Advanced just to compare model outputs, there is a simpler setup. Fello AI is a native Mac app that routes your prompts to ChatGPT, Claude, Gemini, Grok, and DeepSeek through one interface, for $9.99/month. One price, every top model, no tab switching.
That matters in a world where OpenAI ships a new GPT-5 variant every six weeks, and where each one competes head-to-head with releases from Anthropic, Google, and xAI. If GPT-5.5 wins on agentic coding today but Claude or Gemini edges ahead on a specific task next month, you can switch models inside the same app without changing your workflow.
Conclusion
The GPT-5 family has moved fast. In nine months, OpenAI has gone from a capable general-purpose model to one that operates software like a human, holds an entire codebase in context, outperforms industry professionals on knowledge work benchmarks across 44 occupations, and now, with GPT-5.5, handles multi-step agentic coding workflows that previously needed human supervision.
For most users, GPT-5.3 Instant is still the clear everyday choice. It is the cheapest model in the family, hallucinates less than any prior version, and no longer feels like it was written by a compliance department. For professional workflows that need computer use, 1M-token context, or maximum reasoning depth, GPT-5.5 is now the model to reach for. And if you are building anything that will still be running in 12 months, keep an eye on ChatGPT 6, the shift from tool to autonomous collaborator is coming.
FAQ
What is the latest ChatGPT model in 2026?
GPT-5.5, released April 23, 2026. It is available on ChatGPT Plus, Pro, Business, and Enterprise tiers, plus Codex. GPT-5.5 Pro is restricted to Pro, Business, and Enterprise. It scores 93.6% on GPQA Diamond, 78.7% on OSWorld-Verified, and 58.6% on SWE-Bench Pro.
What is the difference between GPT-5.4 and GPT-5.5?
GPT-5.5 outperforms GPT-5.4 on every frontier benchmark, most notably Terminal-Bench 2.0 (82.7% vs 75.1%), OSWorld-Verified (78.7% vs 75.0%), and SWE-Bench Pro (58.6% vs 57.7%). The trade-off is price, GPT-5.5 standard is $5 / $30 per 1M tokens, double GPT-5.4’s $2.50 / $15. GPT-5.5 Pro pricing matches GPT-5.4 Pro at $30 / $180.
What is the difference between GPT-5.3 and GPT-5.5?
GPT-5.5 adds native computer use, a 1M-token context window, and significantly stronger reasoning and agentic coding performance. GPT-5.3 Instant is dramatically cheaper (~$0.30 / $1.20 versus $5 / $30 per 1M tokens), making it better for everyday tasks where you do not need those advanced features.
Is GPT-5.5 worth the price?
For agentic workflows, computer use, and professional coding, yes. For standard writing, chat, and reasoning, GPT-5.3 Instant still delivers strong results at roughly 6% of the input-token cost. The best approach is to use GPT-5.3 Instant as your default and switch to GPT-5.5 when the task genuinely needs it.
When is ChatGPT 6 coming out?
No official date. Mid-to-late 2026 is the most common estimate. Sam Altman has confirmed it will come sooner than the gap between GPT-4 and GPT-5. See our ChatGPT 6 guide for the latest confirmed details.
Which ChatGPT model is best for coding?
GPT-5.5 for agentic, multi-step engineering tasks, and GPT-5.5 Pro when you need the highest possible accuracy on hard problems. GPT-5.3-Codex remains a solid lower-cost option for large-codebase work with its 1M context, and GPT-5.2-Codex is still available through mid-2026 if you need the previous generation’s pricing.




