Claude Sonnet 5 blog cover with bold gold typography, a floral glowing number 5, and a dark luxury botanical background.

Claude Sonnet 5 Just Launched

On June 30, 2026, Anthropic released Claude Sonnet 5, the newest version of its balanced mid tier model and the most agentic Sonnet the company has ever shipped. It is the direct successor to Sonnet 4.6, and it lands with one clear pitch: near Opus 4.8 quality at a fraction of the cost.

The headline is not a benchmark. It is the price. Sonnet 5 launches with introductory pricing of $2 per million input tokens and $10 per million output tokens, cheaper than Opus 4.8, cheaper than OpenAI’s GPT-5.5, and cheaper than Google’s Gemini 3.1 Pro. For teams running agents at scale, where the same job can burn through millions of tokens across hundreds of steps, that pricing changes the math.

Anthropic frames the release around a shift in the market. Agentic work, meaning a model that plans, calls tools, and runs autonomously without a human in the loop, used to require the largest and most expensive frontier models. With Sonnet 5, the company argues that capability is now “table stakes” across price tiers, and the competition has moved from raw capability to cost and reliability.

In this article we cover what Sonnet 5 is, the full benchmark picture against Sonnet 4.6 and Opus 4.8, how it stacks up against GPT-5.5 and Gemini 3.1 Pro, the introductory pricing and the discount window, the new tokenizer, the thinking and effort behavior, the safety changes, and where it fits in Anthropic’s lineup.

What Claude Sonnet 5 Actually Is

Claude Sonnet 5 is a general purpose reasoning model that sits in the middle of Anthropic’s ladder, above Haiku and below Opus. It is built to be the everyday workhorse: fast enough for interactive use, cheap enough for high volume production, and now capable enough to run real agentic workloads. Here are the basics at a glance.

  • API identifier: claude-sonnet-5
  • Context window: 1 million tokens
  • Adaptive thinking: always on, with effort defaulting to high on the API and in Claude Code
  • New default model for Free and Pro users on claude.ai
  • Built for agents, with strong tool use across browsers and terminals

The defining trait is autonomy. Anthropic says the model “can make plans, use tools like browsers and terminals, and run autonomously at a level that, just a few months ago, required larger and more expensive models.” Early access partners described it as much more agentic than its predecessors, meaning it is better at carrying a multi step job from start to finish rather than needing a human to nudge it along at each turn.

The other big change is under the hood. Sonnet 5 ships with an updated tokenizer, which affects how you count and budget tokens. More on that below.

The Headline Is Cost

For most previous Claude launches, the story was a new capability ceiling. For Sonnet 5, the story is the cost floor. Anthropic positioned the model as a cheaper way to run agents, and the pricing is the centerpiece.

At the introductory rate of $2 and $10, Sonnet 5 undercuts every flagship it competes with. Opus 4.8 costs $5 and $25, so Sonnet 5 is well under half the price of Anthropic’s top model while landing close to it on quality. GPT-5.5 sits at $5 and $30. Gemini 3.1 Pro is $2 and $12. Only Google’s smaller and faster Gemini 3.5 Flash comes in cheaper.

The timing is not an accident. Anthropic is racing toward a widely reported IPO, and agent workloads are where enterprise spending is concentrating. A model that delivers Opus class agentic performance at Sonnet prices is aimed squarely at teams deciding how much of their automation budget to commit, and to which vendor.

There is a catch that only shows up once you look past the per token price, and it is important. Sonnet 5 works harder than its predecessor, using far more tokens per task, which means the per task cost can actually land above Opus 4.8 at standard pricing. The independent data below breaks this down.

Benchmark Results

Anthropic published a benchmark table with the launch, comparing Sonnet 5 against Sonnet 4.6 and Opus 4.8. The topline is that Sonnet 5 beats Sonnet 4.6 on every published benchmark, and closes most of the gap to Opus 4.8.

BenchmarkSonnet 5Sonnet 4.6Opus 4.8
Agentic coding (SWE-bench Pro)63.2%58.1%69.2%
Agentic coding (Terminal-Bench 2.1)80.4%67.0%not reported
Computer use (OSWorld-Verified)81.2%78.5%not reported
Multidisciplinary reasoning (Humanity’s Last Exam, with tools)57.4%46.8%57.9%
Knowledge work (GDPval-AA v2)1,618not reported1,615

A few of these numbers stand out. The jump on Terminal-Bench 2.1, from 67.0% to 80.4%, is more than 13 points and points directly at better real world command line and agent behavior. On Humanity’s Last Exam with tools, Sonnet 5 leaps from 46.8% to 57.4%, landing within half a point of Opus 4.8. And on the GDPval-AA v2 knowledge work benchmark, Sonnet 5 actually edges Opus 4.8, 1,618 to 1,615, which is the clearest sign of how narrow the gap between the two tiers has become.

The one place Opus 4.8 still holds a clear lead is the hardest agentic coding split. On SWE-bench Pro, Opus 4.8’s 69.2% sits six points above Sonnet 5’s 63.2%. For the most complex, long horizon coding jobs, Opus remains the stronger choice. For nearly everything else, the difference is small enough that price becomes the deciding factor.

How Sonnet 5 Compares to Sonnet 4.6 and Opus 4.8

Stripping the benchmarks down to positioning, the three Anthropic models line up like this.

AttributeClaude Sonnet 5Claude Sonnet 4.6Claude Opus 4.8
Tiermid tier (balanced)mid tier (balanced)frontier
Input price per million tokens$2 intro, $3 standard$3$5
Output price per million tokens$10 intro, $15 standard$15$25
Context window1M tokens1M tokens1M tokens
Agentic coding standingstronggoodbest
Knowledge work standingmatches Opustrailstop

The pattern is straightforward. Sonnet 5 replaces Sonnet 4.6 outright, beating it on every metric while launching cheaper at the introductory rate. Against Opus 4.8, it trades a small deficit on the hardest coding tasks for a large advantage on cost, and it matches or beats Opus on knowledge work. For a large share of production workloads, the question is no longer whether you need Opus, but whether the six point SWE-bench Pro gap is worth double the price.

How Sonnet 5 Stacks Up Against GPT-5.5 and Gemini 3.1 Pro

Anthropic’s own table focuses on its internal lineup. Placed against the other flagships, the picture is competitive rather than dominant, and it depends heavily on which axis you weight.

On SWE-bench Pro, Sonnet 5 leads at 63.2% against GPT-5.5’s 58.6% and Gemini 3.5 Flash’s 55.1%. On Terminal-Bench 2.1, though, GPT-5.5 edges ahead at 83.4% (run through its Codex CLI) versus Sonnet 5’s 80.4%. So “best agentic coder” is not a clean win. Sonnet 5 takes one coding benchmark, GPT-5.5 takes the other.

Gemini 3.1 Pro plays a different game. On Google’s own model card it posts strong scores like 94.3% on GPQA Diamond and 80.6% on SWE-bench Verified, and it matches the 1 million token context window. Its pricing, at $2 and $12, is close to Sonnet 5’s introductory rate.

The honest summary is that all three flagships are now clustered tightly on agentic coding, and none of them wins every category. What separates Sonnet 5 is the combination of a low price and Opus adjacent quality inside the Claude ecosystem, which matters most to teams already building on Anthropic’s tools.

Pricing and the Introductory Discount

The pricing has two phases, and the timing matters.

  • Introductory (through August 31, 2026): $2 per million input tokens, $10 per million output tokens
  • Standard (from September 1, 2026): $3 per million input tokens, $15 per million output tokens

The standard rate of $3 and $15 is identical to what Sonnet 4.6 cost, so once the discount ends, Sonnet 5 is simply a much better model at the same price as the one it replaces. The introductory window is a two month promotion designed to pull teams onto the model quickly, and it makes Sonnet 5 temporarily one of the cheapest capable agent models available.

Prompt caching pricing is unchanged from Sonnet 4.6. Cache writes carry a 25% premium at $3.75 per million tokens with a 5 minute time to live, and cache hits get a 90% discount at $0.30 per million tokens. For agent workloads that reuse a large fixed context across many calls, caching is where a lot of the real savings live.

One caveat worth flagging for anyone doing careful cost projections is the new tokenizer, which can change how many tokens your text turns into.

The New Tokenizer

Sonnet 5 uses an updated tokenizer that converts the same text into roughly 1.0 to 1.35 times more tokens than previous Claude models. In plain terms, an identical prompt can cost more tokens on Sonnet 5 than it did on Sonnet 4.6, depending on the content.

This has a practical effect on cost comparisons. The headline per token price is lower, but if your workload lands toward the high end of that range, some of the savings gets eaten by higher token counts. Anyone migrating from an older model should re baseline actual token usage on representative prompts rather than assuming the per token discount translates directly into a proportional bill reduction. Code and non English text tend to feel tokenizer changes more than plain English prose.

Independent Benchmarks from Artificial Analysis

Anthropic’s numbers are its own, so the most useful outside read comes from Artificial Analysis, which supported Anthropic in evaluating the model ahead of release. Its verdict is more measured than the launch framing, and it exposes the cost caveat directly.

On the Artificial Analysis Intelligence Index, Sonnet 5 scores 53 with max effort, making it the number five model overall. That is a 6 point gain over Sonnet 4.6, and it puts Sonnet 5 level with GPT-5.5 at high reasoning while sitting only 2 to 3 points behind GPT-5.5 at xhigh and Opus 4.8 at max. It still trails both Opus 4.7 and 4.8 on raw intelligence, which is the expected order for a mid tier model.

The cost finding is the one that reframes the whole launch. Artificial Analysis measured Sonnet 5 at $2.29 per task on the Intelligence Index using standard $3 and $15 pricing, not the promotional rate. That is roughly double Sonnet 4.6 and about 15% more than Opus 4.8. The increase is driven entirely by token usage, not price. With max effort, Sonnet 5 used around 40% more output tokens per task than Sonnet 4.6, and roughly three times the agentic turns on the AA-Briefcase and GDPval-AA knowledge work benchmarks. That behavior scales sharply with the effort setting, with max effort using about six times more turns than low effort on GDPval-AA.

So the “cheaper than Opus” story holds only during the promotional window and only on a per token basis. Once you account for how many tokens Sonnet 5 spends to do the same job at high effort, and once the discount ends, it can cost more per task than Opus 4.8. The lever that controls this is effort: dial it down and both cost and latency drop, at the price of some intelligence.

Where Sonnet 5 clearly earns its keep is agentic knowledge work. On both AA-Briefcase and GDPval-AA, run through Artificial Analysis’s open source Stirrup agent harness, Sonnet 5 sits just ahead of Opus 4.8 and trails only Claude Fable 5, which is not generally available. These benchmarks test whether a model can produce accurate, well presented professional outputs, and Sonnet 5 punches above its tier on them.

On heavy reasoning and knowledge, though, the larger siblings still lead. On CritPt, a frontier physics reasoning benchmark from researchers at Argonne and UIUC, Sonnet 5 scores 17%. That is 14 points above Sonnet 4.6, a real jump, but it lands behind GLM-5.2, Claude Opus and Fable, and GPT-5.5 at xhigh and Pro. Artificial Analysis also measured strong gains over Sonnet 4.6 on Terminal-Bench v2.1 (up 9 points), Humanity’s Last Exam (up 10 points), and SciCode (up 7 points), with relatively flat scores on other evaluations.

Thinking and Effort

Sonnet 5 uses adaptive thinking, which is always on. Rather than a fixed thinking budget, the model decides for itself how much to reason before answering, scaling its effort to the difficulty of the task. On the API and in Claude Code, the effort level defaults to high, which favors thoroughness over raw speed.

Effort is the recommended way to configure the tradeoff between performance, cost, and latency. Sonnet 5 adds an xhigh setting that Sonnet 4.6 did not have, giving it the same five effort levels as Opus 4.8: max, xhigh, high, medium, and low. As the Artificial Analysis data shows, that dial has a large effect, with max effort spending several times more agentic turns than low on the same task.

This is the same reasoning approach Anthropic uses on its Opus tier, and it is a big part of why Sonnet 5 closes so much of the agentic gap. The model can pause to plan a multi step job, call a tool, read the result, and correct course, all without a human prompting each step. For latency sensitive or cost sensitive workloads, the effort level can be dialed down, trading some intelligence for faster and cheaper responses.

Built for Agents

The clearest theme across the launch is that Sonnet 5 is designed to run agents, not just answer questions. The gains on Terminal-Bench 2.1 and OSWorld-Verified are agent benchmarks, measuring how well the model operates a terminal and a computer, and both moved up meaningfully over Sonnet 4.6.

Launch partners echoed the point. Daniel Shepard of Zapier said tasks that “used to stall halfway” now complete, calling the model “a no-brainer” for day to day automation. The value of a reliable mid tier agent model is that it can be deployed broadly, across many concurrent workflows, without the cost of a frontier model on every call.

This lines up with where Anthropic has been steering its product. The company has been building toward agent and coworker style functionality, where a model is trusted to run a job end to end. Sonnet 5 is the version of that vision priced for volume.

Real World Performance from Launch Partners

Beyond the benchmarks, the partner reports focused on reliability and judgment rather than raw speed.

Zapier highlighted completion rates on multi step automations, the practical measure of whether an agent can be trusted with a whole workflow. Lovable, an AI app building platform, emphasized the safety side. Fabian Hedin of Lovable said “a model that knows when to say no is just as important as one that knows how to build,” pointing at Sonnet 5’s improved ability to refuse bad or malicious requests.

That judgment matters more for agents than for chat. When a model is running autonomously and calling tools, a wrong action can have real consequences, so a model that reliably declines to do something harmful is safer to deploy without a human watching every step.

Safety and Alignment

Anthropic reports that Sonnet 5 is safer than Sonnet 4.6 on several axes. It shows lower rates of misaligned behavior, less cooperation with attempts at misuse, less deception, and fewer hallucinations. It is also better at refusing genuinely malicious requests.

On the security front, cyber safeguards are enabled by default. Notably, Anthropic says Sonnet 5 has substantially poorer exploit development abilities than Opus 4.8, meaning it is less capable of writing offensive security code, which the company frames as a deliberate safety property for a model meant to be deployed at massive scale. A widely deployed mid tier model is a bigger attack surface than a premium frontier model, so keeping its dangerous capabilities lower is part of the design.

Availability and Access

Anthropic shipped Sonnet 5 broadly on launch day.

  • claude.ai, where it is the new default for Free and Pro users, and available on Max, Team, and Enterprise
  • Claude Code, live at launch
  • Claude API and Claude Platform, under the identifier claude-sonnet-5
  • Third party developer tools, including Cursor, VS Code, and GitHub Copilot

Making it the default for Free and Pro users is a significant move. It puts an agent capable model in front of the entire consumer base immediately, rather than reserving the newest capability for paying or enterprise tiers.

Why This Release Matters

Three things make Sonnet 5 worth paying attention to beyond the benchmark numbers.

Agent capability moved down a tier. The performance that required Opus a few months ago now runs at a mid tier per token price. The savings are real during the promotional window and at lower effort, though the independent data is a reminder that per task cost depends on how many tokens the model spends, not just the sticker rate.

The tier gap is narrowing. On knowledge work, Sonnet 5 already matches Opus 4.8, and on reasoning with tools it is within half a point. The gap that justifies paying for the top tier is shrinking to a specific set of hardest coding tasks, which changes how teams choose models.

Price is now the battleground. With capability clustering across the flagships, Anthropic is competing on cost and reliability rather than a single headline benchmark. The introductory discount, the volume friendly standard price, and the safety improvements are all aimed at winning the agent deployment decision rather than the leaderboard.

For anyone who wants to use Claude Sonnet together with ChatGPT, Gemini, DeepSeek, and Perplexity, the fastest way is to use Fello AI. App that puts every frontier model into a single native app for Mac, iPhone, and iPad, so you can run the same task across models and pick the right one for the job without managing a stack of subscriptions.

Share Now!

Facebook
X
LinkedIn
Threads
電子メール

Get Exclusive AI Tips to Your Inbox!

Stay ahead with expert AI insights trusted by top tech professionals!