Anthropic Launches Claude Sonnet 4.5: Frontier AI Built for Code & Complex Tasks

On September 29, 2025, Anthropic has released Claude Sonnet 4.5, its latest frontier AI model, positioning it as the most capable and most aligned model the company has developed to date. The launch builds on the foundation of previous Claude Sonnet and Opus releases, with a focus on sustained autonomous performance, coding ability, tool use, and measurable improvements in safety behavior.

After less than two months of internal testing, Anthropic has pushed Claude Sonnet 4.5 into production across its API, Amazon Bedrock and Google Vertex AI. The model is now available to all users via Claude.ai and API access, and will soon be integrated into Fello AI.

Introducing Claude Sonnet 4.5—the best coding model in the world.

It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math. pic.twitter.com/7LwV9WPNAv
— Claude (@claudeai) September 29, 2025

Let’s take a deep look at what’s new, how it compares to other frontier models, and why it might be the best model on the market right now for developers, enterprises, and power users.

Table of Contents hide

Focus on Coding & Computer Use

Benchmark Wins at a Glance

Solid Performance Across Other Fields

Best-in-Class Financial Analysis

Agents and Autonomy

Imagine with Claude

Final Thoughts

Focus on Coding & Computer Use

Claude Sonnet 4.5 delivers state-of-the-art performance on SWE-bench Verified, a benchmark that evaluates an AI model’s real-world software engineering capabilities. The model achieved 77.2% accuracy, rising to 82.0% when using parallel test-time compute. These results position Claude ahead of competitors, including OpenAI’s GPT-5 and Google’s Gemini 2.5 Pro.

This increase in performance is not limited to short tasks. Anthropic says Claude 4.5 is capable of sustaining uninterrupted coding for up to 30 hours, a significant jump from the 7-hour benchmark achieved by Claude Opus 4 earlier this year. In internal tests, the model was able to autonomously build and deploy full software stacks—including backend services, domain configuration, and even security audits.

Tool use has also improved. On the OSWorld benchmark, which evaluates the model’s ability to perform real-world tasks in an operating system environment, Claude Sonnet 4.5 achieved 61.4%, up from 42.2% in its previous version. This suggests better performance in tasks like spreadsheet editing, document generation, and navigation of web or desktop tools.

Benchmark Wins at a Glance

Sonnet 4.5 posts material gains on almost every widely-tracked evaluation suite. A few headline numbers pulled from Anthropic’s internal charts:

Public Benchmark	Claude 4.5	Opus 4.1	Sonnet 4	GPT-5	Gemini 2.5 Pro
SWE-bench Verified (software bugs fixed)	77.2 % 82 %*	74.5 %	72.7 %	72.8 %	67.2 %
OSWorld (real PC tasks)	61.4 %	44.4 %	42.2 %	—	—
AIME 2025 (HS maths, Python)	100 %	78 %	70.5 %	99.6 %	88 %
Finance Agent (win-rate vs baseline)	55.3 %	50.9 %	44.5 %	46.9 %	29.4 %
Misaligned-behaviour score ↓	≈ 13 %	≈ 29 %	≈ 24 %	≈ 17 %	≈ 41 %

Solid Performance Across Other Fields

Beyond engineering, Claude Sonnet 4.5 shows strength in various academic and domain-specific evaluations.

It scored 83.4% on GPQA (graduate-level reasoning), and 89.1% on MMLU (multilingual question answering).
On the AIME 2025 high school math competition benchmark, it scored 87.0% without tools and 100% using Python.
Visual reasoning also improved, with a score of 77.8% on the MMMU benchmark.

In financial analysis, the model reached 55.3% on the Finance Agent benchmark, outperforming all other tested models, including GPT-5 and Gemini 2.5 Pro.

Complementing benchmark data, Anthropic also published win-rate-based evaluations in finance-specific tasks, showing Claude 4.5 winning 68%–72% of the time against baselines, depending on context window length.

Best-in-Class Financial Analysis

Claude Sonnet 4.5 shows a 72% win rate vs baseline in finance tasks with its 16K context setting. That’s the highest among any Claude version and even better than Claude Opus 4.1 (60%). On the FinanceAgent benchmark:

Model	Finance Accuracy
Claude Sonnet 4.5	55.3%
Claude Opus 4.1	50.9%
GPT-5	46.9%
Gemini 2.5 Pro	29.4%

It excels at tasks like modeling, forecasting, and spreadsheet creation, making it especially useful for financial services teams.

Agents and Autonomy

One of the most significant aspects of Claude Sonnet 4.5 isn’t just its raw performance—it’s the infrastructure now available to developers. With the release of the Claude Agent SDK, Anthropic is opening up the same core components it used internally to build Claude Code. This SDK allows developers to create autonomous AI agents capable of running complex, multi-step workflows with minimal oversight.

The SDK supports long-running memory across tasks, handles permissions for agent autonomy, and allows for coordination between sub-agents. It also brings native support for executing code and generating files—including documents, spreadsheets, and presentations—directly within conversations.

Developers can now use these tools to go beyond chat-based assistants and build fully autonomous AI workers that interact with software systems, execute tasks, and respond to dynamic user needs in real time.

One reported use case highlights the potential: a customer ran Claude Sonnet 4.5 continuously for 30 hours, during which the model built an entire full-stack web application. It configured backend services, registered a domain, and even passed a SOC 2 compliance audit—all without manual intervention. This level of sustained autonomy is unmatched by current large language models.

Imagine with Claude

Alongside the launch, Anthropic introduced a limited-time research experiment: “Imagine with Claude.”

In this preview, Claude Sonnet 4.5 builds fully interactive software from scratch, in real time, with no prewritten code. It’s available to Max subscribers for 5 days and showcases the power of the new model when paired with responsive infrastructure.

This feature isn’t production-grade yet, but it’s a bold vision of where autonomous software generation is heading.

Final Thoughts

Claude Sonnet 4.5 is clearly designed for professionals who need AI to handle actual tasks—not just generate text. Whether you’re a developer building agents, a financial analyst modeling forecasts, or a technical lead automating workflows, this model delivers high performance with a strong focus on reliability.

The coding benchmarks speak for themselves. It leads SWE-bench Verified, outpaces GPT-5 in computer use, and shows strong gains in reasoning, math, and multilingual tasks. It can run autonomously for extended periods and maintain focus across long, complex workflows. That makes it a strong fit for product builders, backend teams, and enterprise automation.

Compared to the broader landscape, Claude 4.5 stands out in use-case coverage. GPT-5 still does well in general reasoning and content generation. Gemini holds ground in visual tasks. But when it comes to agentic behavior, secure tool use, and building production-grade systems, Claude 4.5 is ahead.

Safety also matters—especially in high-trust domains. With the lowest misalignment score among major models and strong protection against prompt injection, Claude 4.5 can be deployed in regulated industries or integrated into critical workflows without constant oversight.

If your goal is to get serious work done with AI, Claude Sonnet 4.5 is likely the most capable and dependable model available right now.

Share Now!

Get Exclusive AI Tips to Your Inbox!

Stay ahead with expert AI insights trusted by top tech professionals!

Michal Langmajer
September 29, 2025
AI, anthropic, claude, claude 4.5, future of AI, Imagine with Claude, llm, LLMs, professionals, programmers

Get Fello AI: All-In-One AI Chatbot

All top AI models like GPT-5.2, Claude 4.5, Gemini 3, or Grok 4.1 – in one app that works on Mac, iPhone, and iPad.

Get Fello AI Now!

Anthropic Launches Claude Sonnet 4.5: Frontier AI Built for Code & Complex Tasks