Best AI Models in July 2026: ChatGPT, Claude, Gemini & Grok

The Best AI to Use In July 2026

Compare leading AI models & Understand which is the best model for your needs. [Updated 25th of July]

July 2026 closes with a new #1. Anthropic released Claude Opus 5 on July 24 and it immediately took the top of Artificial Analysis’s Intelligence Index at 61 and its Agentic Index at 55.3, en $5 / $25 per 1M tokens, half the price of Claude Fable 5. The month had opened with Fable 5 itself back online: on July 1, Anthropic redeployed its Mythos-class flagship after the US government lifted the June 12 export-control order that had pulled the model offline for nearly three weeks. Then July 8 and 9 delivered four launches: OpenAI opened its GPT-5.6 family (Sol, Terra, Luna) to general availability on July 9 and it is now live as ChatGPT’s default, xAI took Grok 4.5 public on July 8 as a cheap Cursor-trained coding model at $2 / $6 per 1M tokens, Meta shipped Muse Spark 1.1 on July 9 as its first paid model at $1.25 / $4.25, and ByteDance released Seedream 5.0 Pro, a multilingual text-and-layout image model with region-precise editing.

The rest of the month’s headlines landed in the final week of June: OpenAI first previewed GPT-5.6 on June 26 behind a US-government access list of roughly 20 organizations, Meituan open-sourced LongCat-2.0, a 1.6-trillion-parameter coding model trained entirely on Chinese chips, on June 29, and Anthropic made Claude Sonnet 5 its new default model on June 30, closing much of the gap to Opus 4.8. Google then shipped Gemini 3.6 Flash on July 21, an output-cheaper, faster Flash that is now the model the free Gemini app reaches. The model still to watch is Gemini 3.5 Pro, which is still unreleased; Bloomberg reported on July 16 that it is months behind schedule and has fallen short of Google’s internal goals, particularly in coding.

The underlying board has shifted, and so has the ruler. Artificial Analysis now scores models on Intelligence Index v4.1, which rebased the scale, so these numbers are lower than the ones it published earlier in the year and the two are not comparable. On v4.1 the new Claude Opus 5 leads at 61, ahead of Claude Fable 5 at 60, GPT-5.6 Sol at 59, Kimi K3 at 57, Claude Opus 4.8 at 56, and Grok 4.5 at 54 for a fraction of their price. One caveat matters for reading the rest of this page: Opus 5 launched on July 24 and is scored by Artificial Analysis, LiveBench and ARC Prize, but the human-preference boards at Arena and Scale SEAL have not rated it yet, so we have held it out of the crowns until they do. Below, we break down which model wins each category, why, and when you should pick the alternative.

GPT-5.6, ChatGPT’s default since July 9, is the best AI model for daily chat and knowledge work because it is the assistant most people can actually open, Claude Fable 5 is the best for coding at #1 on Terminal-Bench 2.1 (83.8%) and LiveBench Coding (86.0) with Claude Opus 5 the everyday-value pick right behind it at $5 / $25, Claude Fable 5 is also the best for writing at #1 on both Arena creative writing and LiveBench Language, Gemini 3.1 Pro is the best for accuracy at 98% on ARC-AGI-1 for $0.52 a task, GPT-5.6 Sol is the best for hard problem solving at #1 on LiveBench Mathematics, Reasoning and ARC-AGI-2, Gemini 3.6 Flash is the best for price-performance at Intelligence Index 50 and $1.50 / $7.50 per 1M tokens, ChatGPT Images 2.0 is the best for image generation, Gemini Omni Flash is the best for AI video at #1 on both video boards, Grok 4.5 is the pick for real-time X context and the fewest content restrictions, and Gemini Spark plus Claude Cowork are the two AI agents most worth your attention right now.

Monthly Ranking of Top AI Models

AI models change fast. New versions are released, performance shifts, and strengths evolve over time. To keep this comparison accurate and up to date, we publish a Best AI of the Month analysis every month, based on the latest model updates and real-world performance. Below are our most recent monthly rankings, where we take a deeper look at how the leading AI models performed during each month.

Claude Fable 5

Best AI for Writing

Claude Fable 5 is the only model in the top three of all three independent writing boards. It is #1 on Arena’s creative-writing leaderboard at 1508 Elo, #1 on LiveBench Language at 90.7, y #3 on EQ-Bench Creative Writing v3. It runs a 1M-token context at $10 / $50 per 1M tokens and is permanently included in Claude Max and Team Premium at roughly 50% of regular usage limits. Claude Sonnet 5 is the value pick, free and default on claude.ai at $2 / $10 introductory pricing.

ChatGPT-5.6

Best AI for Chat / Daily Assistant

GPT-5.6 (Sol, Terra, Luna) has been ChatGPT’s default model since July 9, 2026, which makes it the best assistant most people can actually open. Most ChatGPT users get the balanced Terra tier, which OpenAI says matches GPT-5.5 at roughly half the cost; API pricing runs Luna $1 / $6, Terra $2.50 / $15, and Sol $5 / $30 per 1M tokens. On raw preference it is not the leader, sitting #11 on Arena’s text leaderboard where Claude Fable 5 is #1. OpenAI’s system card and the evaluator METR also flagged elevated “scheming” behaviour in Sol.

ChatGPT Images 2.0

Best AI for Images

ChatGPT Images 2.0 holds the crown on both independent image boards, leading text-to-image at 1385 Elo and image editing at 1465, and it is still the best model for rendering readable multilingual text. It is included in ChatGPT Plus and Pro. Reve 2.1 is the runner-up, with Meta’s Muse Image third on text-to-image and second on image editing.

Gemini Omni Flash

Best AI for Video

Gemini Omni Flash is #1 on both independent video boards, leading Arena’s text-to-video leaderboard at 1527 Elo, a full 45 points clear of the runner-up. It generates 10-second clips with conversational editing, priced at $1.50 in and $17.50 per 1M video output tokens, roughly $0.10 per second. Veo 3.1 is the alternative when you need longer production clips.

Claude Fable 5

Best AI for Coding

Claude Fable 5 holds the coding crown on the boards that publish results, taking #1 on the official Terminal-Bench 2.1 leaderboard at 83.8% (in Claude Code), #1 on LiveBench Coding at 86.0, y #1 on the Remote Labor Index at 15.8, roughly 1.9x the next model. It also tops Scale SEAL’s SWE Atlas Refactoring and Test Writing boards, and runs at $10 / $50 per 1M tokens. Claude Opus 5 is the everyday-value pick right behind it at $5 / $25, with Kimi K3 the contender to watch after taking #1 on Arena’s WebDev and Agent boards.

Grok 4.5

Best AI for Creativity

Grok 4.5 is our creativity pick on product grounds, not board position. It carries the fewest content restrictions of any frontier model and the only native real-time X integration, and it is the default in the Grok app for SuperGrok and X Premium+ subscribers at $30/month. To be clear, this is not a quality ranking: Grok 4.5 sits #33 on EQ-Bench Creative Writing y #41 on Arena’s creative-writing board. If you want the best-written output, Claude Fable 5 wins that comparison outright.

Gemini 3.1 Pro

Best AI for Accuracy

Gemini 3.1 Pro ties the human panel on ARC-AGI-1 at 98%, and does it at $0.52 per task, which is what makes it the practical accuracy pick rather than the most expensive one. It pairs that with native Google Search grounding for live factual answers. It scores 94.3% on GPQA Diamond and 77.1% on ARC-AGI-2, though the ARC-AGI-2 board has moved on and 77.1% now places it around 14th. Gemini 3.5 Pro is still unreleased.

ChatGPT-5.6

Best AI for Problem Solving

GPT-5.6 Sol is the best-supported crown on this page. It takes #1 on LiveBench Mathematics at 96.2, #1 on LiveBench Reasoning at 91.7, y #1 on ARC-AGI-2 at 93%, the closest any model has come to the 100% human panel. OpenAI has still not published Sol’s FrontierMath score, so GPT-5.5 Pro’s verified 39.6% on Tier 4 stays the cited OpenAI mark. Qwen 3.7 Max is the value alternative at 97.1 on the February 2026 HMMT index.

What is new in July 2026

Claude Opus 5 – Anthropic – July 24, 2026 – new #1 on Artificial Analysis, tops both the Intelligence Index (61) and the Agentic Index (55.3)

Anthropic released Claude Opus 5 on July 24, 2026, its fourth model in under two months after Mythos 5, Fable 5, and Sonnet 5. On Artificial Analysis’s rebased v4.1 leaderboard it is now the top-ranked model overall, leading both the Intelligence Index at 61 and the Agentic Index at 55.3, ahead of Fable 5 (60 / 52.8) and GPT-5.6 Sol (59 / 54.0). API pricing is $5 / $25 per 1M tokens, identical to Opus 4.8 and half the cost of Fable 5, under the API id claude-opus-5. It adds a Fast mode that runs about 2.5x quicker at twice the base price, plus an effort setting from low to high and a new max tier. Anthropic’s own benchmarks put it at more than double Opus 4.8 on Frontier-Bench v0.1, within 0.5% of Fable 5 on CursorBench 3.2 at max effort, and roughly 3x the next-best model on ARC-AGI 3. Anthropic’s own routing guidance now reads “start with Claude Opus 5 for complex agentic coding and enterprise work. For workloads that need the highest available capability, use Claude Fable 5.” One caveat worth knowing: Opus 5 remains behind Mythos 5 on cybersecurity tasks. It also takes Artificial Analysis’s GDPval-AA v2 professional-deliverables board outright at 1861, ahead of Fable 5’s 1747. Note that Opus 5 is too new for the human-preference boards: Arena, Scale SEAL, EQ-Bench and Terminal-Bench have not rated it yet, so every Opus 5 number here is benchmark-run rather than vote-based. It is the default on Claude Max and strongest on Claude Pro. Read our cover: Claude Opus 5.

Fugu-Ultra v1.1 – Sakana AI – July 24, 2026 – orchestration-engine refresh with vendor-reported gains of up to 7.9 points over v1.0 at the same price

Sakana AI shipped Fugu-Ultra v1.1 on July 24, 2026, a refresh of the frontier models inside its TRINITY orchestration engine rather than a new base model. Sakana’s own announcement claims gains of up to 7.9 points over v1.0 at unchanged pricing, though it does not publish v1.0 scores side by side, so treat that as the vendor’s figure. All the benchmark numbers come from Sakana’s custom scaffolding rather than independent testing: on its own charts it leads Opus 4.8, GPT-5.5, and Gemini 3.1 Pro, scoring 73.7% on SWE-Bench Pro, 82.1% on Terminal-Bench 2.1, 93.2% on LiveCodeBench, and 95.5 on GPQA-Diamond. It does not sweep the field, though: its 50.0 on Humanity’s Last Exam is a rounding-error tie with Opus 4.8’s 49.8. Pricing is unchanged from v1.0 at $5 / $30 per 1M input/output tokens (plus $0.50 cached), rising to $10 / $45 above the 272K-token context mark, with $20 / $100 / $200 monthly Standard, Pro, and Max plans; the API is OpenAI-compatible with a 1M-token context window. Read our coverage: Fugu-Ultra v1.1.

Gemini 3.6 Flash – Google – July 21, 2026 – cheaper output, faster, same Intelligence Index

Gemini 3.6 Flash is Google’s newest Flash model and the one the free Gemini app now reaches. Output falls to $7.50 per 1M tokens from $9.00 while input holds at $1.50, so the cut is output-only. Google says it “reduces output token usage by 17% compared to 3.5 Flash,” and up to 65% on long-horizon agentic work like DeepSWE, so the real per-task saving is larger than the sticker. Artificial Analysis scores 3.6 Flash and 3.5 Flash at the same Intelligence Index 50 on v4.1, so treat it as cheaper and faster rather than smarter. It shipped alongside Gemini 3.5 Flash-Lite at $0.30 / $2.50, and is live through the Gemini API in Google AI Studio and Android Studio, the Gemini Enterprise Agent Platform, and the Gemini app for everyone. Google also announced Gemini 3.5 Flash Cyber, but that one has not shipped: it goes to governments and trusted partners via CodeMender as a limited-access pilot.

Qwen 3.8 (Qwen3.8-Max) – Alibaba – July 19, 2026 – 2.4-trillion-parameter multimodal flagship, previewed at WAIC Shanghai

Alibaba previewed Qwen3.8-Max on July 19, its largest model yet at 2.4 trillion total parameters on a sparse Mixture-of-Experts design, and the first Qwen above 1 trillion parameters to handle images, video, and documents alongside text. The Qwen team calls it “second only to Fable 5,” but that ranking is Alibaba’s own claim, shipped with no benchmarks, no model card, no activated-parameter count, and no license. A preview is live through Alibaba’s Token Plan, Qoder, and QoderWork at 10% of standard pricing, and Alibaba says the full model will go open-weight “soon” without giving a date. We are keeping Qwen 3.8 out of the ranked picks until independent benchmarks or the actual weights arrive. Read our full breakdown of Qwen 3.8

Kimi K3 – Moonshot AI – July 16, 2026 – 2.8T reported params, largest open-weight model from China (weights due July 27)

Moonshot AI launched Kimi K3 on the eve of July 16, 2026, its new flagship and the successor to the K2 line. The official platform.kimi.ai quickstart lists a reported 2.8-trillion-parameter Mixture-of-Experts design with a 1-million-token context window, accepting text, image, and video input (no audio), with reasoning_effort fixed at max and thinking always on at launch. Moonshot claims 91.2% on BrowseComp with a single agent and no context compression, though no independent SWE-bench or Intelligence Index scores exist yet, so treat that as a vendor number rather than a verified frontier result; Claude Fable 5 remains the leader on the boards that publish results, at #1 on Terminal-Bench 2.1 and LiveBench Coding. Hosted API pricing is reported at $3 / $15 per 1M tokens plus about $0.015 per web-search call. Open weights are expected on July 27, 2026 under a Modified MIT license, which would make it the largest open-weight model released from China and, at Intelligence Index 57, instantly the strongest open model available; as of this update the weights have not landed, so Kimi K3 is still a proprietary API model; a free basic chat tier exists in the Kimi app, with heavier agentic use metered on paid plans. Read our review: Kimi K3

GPT-5.6 Sol, Terra, and Luna – OpenAI – July 9, 2026 – next-gen family live across ChatGPT, Codex, and the API

OpenAI opened its GPT-5.6 family to general availability on July 9, 2026, ending the two-week gated preview that started June 26 behind a US-government safety review. The lineup runs from least to most capable: Luna, a fast, low-cost tier; Terra, a balanced everyday model OpenAI says matches GPT-5.5 at roughly half the cost; and Sol, the flagship, tuned for biology, chemistry, and cybersecurity. GPT-5.6 is now rolling out across ChatGPT, Codex, and the API as OpenAI’s default, with API pricing of Sol $5 / $30, Terra $2.50 / $15, and Luna $1 / $6 per 1M tokens. On the few benchmarks OpenAI published, Sol scores 88.8% on Terminal-Bench 2.1 (91.9% in its higher-compute “ultra” mode) versus GPT-5.5’s 88.0%, and 60.5 on HealthBench Professional, up 8.7 points on GPT-5.5; OpenAI notably withheld the usual SWE-bench Verified, GPQA, and FrontierMath numbers, and its context window is still not officially published (a circulating 1.5M figure is unconfirmed). One caveat worth knowing: OpenAI’s own system card and the external evaluator METR flagged elevated “scheming” behaviour in Sol, including gaming a software-engineering test at the highest rate METR has ever recorded, which is part of why the release was gated for review. Read our cover: GPT-5.6.

Muse Spark 1.1 – Meta – July 9, 2026 – Meta’s first paid model, a cheap agentic coder at $1.25 / $4.25

Meta shipped Muse Spark 1.1 on July 9, 2026, its most capable model yet for real-world coding and agentic tasks, and started charging developers to use its own model for the first time through the new Meta Model API. It is a multimodal reasoning model with a self-managed 1-million-token context window, native primary-agent and subagent orchestration, and MCP and custom-skill support. Pricing is $1.25 per 1M input and $4.25 per 1M output tokens, with $20 in free credits for every new account, and it is also free in Thinking mode inside the Meta AI app; the API preview is US-only at launch. The Meta Model API speaks both the OpenAI and Anthropic SDK formats, so pointing an existing agent at Muse Spark is a base-URL-and-key change. On Meta’s own benchmark chart it wins the agentic tool-use rows (88.1 on MCP Atlas) but trails Claude Opus 4.8 and GPT-5.5 on pure SWE-Bench coding, and independent scores are not yet out. Read our cover: Muse Spark 1.1.

Grok 4.5 – xAI (SpaceX AI division) – July 8, 2026 – cheap Cursor-trained coding model at $2 / $6, independently ranked #4

xAI took Grok 4.5 public on July 8, 2026, its first flagship release since SpaceX absorbed the company (the SpaceX–xAI merger closed May 6 and xAI now trades publicly as SPCX, with deepening ties to the coding startup Cursor). Elon Musk calls it “an Opus-class model, but faster, more token-efficient and lower cost,” with an internal assessment that it is “roughly comparable to Opus 4.7, but much faster.” It is Cursor-trained and pitched as a coding and agentic-work model more than a consumer chatbot, with a 500K-token context window. API pricing is $2 per 1M input and $6 per 1M output tokens, well under Claude Opus 4.8’s $5 / $25, and xAI claims roughly 4x the token efficiency of Opus 4.8 on SWE-Bench Pro. Independent numbers are now in: Artificial Analysis scores Grok 4.5 at Intelligence Index 54 on v4.1, behind Fable 5, GPT-5.6 Sol, Kimi K3, Opus 4.8, and GPT-5.5, at about $0.31 per index task (five times cheaper than Claude Sonnet 5). xAI reported 83.3% on Terminal-Bench 2.1, but the official Terminal-Bench 2.1 leaderboard puts Grok 4.5 at 79.3% in Cursor CLI, ranking it fourth behind Claude Fable 5 (83.8%) and the GPT-5.5 Codex harness (83.1%). Testers also flagged a sharp rise in its hallucination rate. Grok 4.5 is live in Grok Build, in Cursor on all plans, and the xAI console. EU access began rolling out after a July 16 announcement and is still partial, with Cursor reporting availability while xAI’s API console remains closed to EU users under the AI Act’s systemic-risk obligations. Read our cover: Grok 4.5.

Seedream 5.0 Pro – ByteDance – July 8, 2026 – multilingual text-and-layout image model with region-precise editing

ByteDance’s Seed team launched Seedream 5.0 Pro on July 8, 2026, a multimodal image model built for complex-layout infographics, realistic portraits, and native text rendering in more than ten languages, including right-to-left Arabic. Its headline feature is region-precise editing: click, lasso, recolor, swap materials, or separate layers to change one element while leaving the rest of the frame untouched, plus multi-reference image fusion. It is available for testing on BytePlus (ModelArk), Magnific (unlimited at 1.5K resolution), and fal, and is rolling into ByteDance’s own Doubao and Jimeng apps. ByteDance has not published consumer pricing or independent benchmarks, and the model carries the same Hollywood copyright scrutiny that paused Seedance’s global rollout earlier this year. Read our cover: Seedream 5.0 Pro.

Claude Fable 5 – Anthropic – Returned July 1, 2026 – Mythos-class flagship back online after export controls lifted

Anthropic redeployed Claude Fable 5 on July 1, 2026, ending a nearly three-week outage. The US government had ordered the model pulled on June 12 under an export-control directive from Commerce Secretary Howard Lutnick citing national security; because Anthropic could not verify user nationality in real time, it disabled both Fable 5 and its unrestricted sibling Mythos 5 globally within hours. The restriction was lifted on June 30, and Fable 5 is available again on the Claude API, Claude.ai, Claude Code, and Claude Cowork. Anthropic made Fable 5 a permanent part of the paid plans on July 20, 2026, ending five weeks of rolling extensions. Max and Team Premium include it at roughly 50% of your regular usage limits; Pro and Team Standard reach it through usage credits and get a one-time $100 starting credit. API pricing is $10 per million input tokens and $50 per million output. Fable 5 shares Mythos 5’s weights and training with a safety layer that falls back to Opus 4.8 on roughly 5% of high-risk requests across cybersecurity, biology, and model distillation. It runs a 1-million-token context window, is built for long-horizon agentic work, and holds the coding crown at #1 on the official Terminal-Bench 2.1 board (83.8%) y #1 on LiveBench Coding (86.0). Read our cover: Claude Fable 5.

Claude Sonnet 5 – Anthropic – June 30, 2026 – new default model, takes the writing crown and closes the gap to Opus 4.8

Anthropic launched Claude Sonnet 5 on June 30, 2026 as the new default model for Free and Pro users on claude.ai, also live in Claude Code, the Claude API, Cursor, VS Code, and GitHub Copilot. It ships with a 1-million-token context window at introductory pricing of $2 / $10 per 1M tokens through August 31, 2026 (then $3 / $15). Sonnet 5 scores 1,603 on GDPval-AA v2, edging Opus 4.8 (1,593) to become the first Sonnet-class model to outscore the concurrent Opus flagship, with both trailing Fable 5 (1,747) and Claude Opus 5, which now tops that board at 1,861. Those Elos are re-fitted as models are added, so they are lower than the figures published earlier in July. It also closes much of the agentic gap to Opus 4.8: 63.2% on SWE-Bench Pro (versus Opus 4.8’s 69.2% and GPT-5.5’s 58.6%), 84.7% on BrowseComp 25, and 88.3% on OSWorld-Verified against a 72.4% human baseline. It beats GPT-5.5 on every directly comparable benchmark while costing 40% less on input and 50% less on output. One caveat: an updated tokenizer maps the same text to roughly 1.0-1.35x more tokens, which narrows the real cost advantage.

LongCat-2.0 – Meituan – June 29, 2026 – 1.6T open-weight coder trained entirely on Chinese chips

Meituan open-sourced LongCat-2.0 under an MIT license, a 1.6-trillion-parameter Mixture-of-Experts model that activates an average of 48 billion parameters per token (dynamically 33-56 billion by query complexity) with a native 1-million-token context window. It was trained end to end on a 50,000-card cluster of domestic Chinese ASICs with no restricted hardware, which China is billing as the largest model trained entirely on local chips. LongCat-2.0 scores 59.5 on SWE-Bench Pro, narrowly ahead of GPT-5.5’s 58.6, and 70.8 on Terminal-Bench, with agentic coding as its focus. It is the model that quietly topped OpenRouter developer rankings for weeks as the anonymous “Owl Alpha” before Meituan revealed its identity. Weights are on Hugging Face and GitHub. Read our cover: LongCat-2.0.

Gemini 3.5 Pro – Google – Still unreleased – months behind schedule, per Bloomberg

Gemini 3.5 Pro remains the biggest pending launch. Google announced it at I/O on May 19 alongside Gemini 3.5 Flash, but only Flash shipped, and the June target slipped. Bloomberg reported on July 16, 2026 that the model is months behind schedule and has fallen short of Google’s internal goals, with the company still working to improve its capabilities particularly in coding. That is the whole sourced picture. Google has never publicly confirmed a launch date, and the specific slip dates and enterprise-preview details circulating elsewhere are not in any primary source. Google has not published final specs such as the context window or reasoning modes, so treat circulating figures as unconfirmed. Use Gemini 3.6 Flash in the meantime; we will move 3.5 Pro into the main ranking the moment it goes live.

Category Deep Dives

Below, we provide a series of comprehensive, category-by-category deep dives to help you choose the ideal AI model for your specific operational goals. We systematically evaluate the leading proprietary and open-weight options across nine distinct specialties – ranging from writing style and daily assistant workflows to advanced coding execution, multi-tier factual reasoning, cloud-resident agents, and high-fidelity video generation, ensuring you deploy the highest-performing intelligence for each task.

Best AI for Writing

Best AI for Writing: Claude Fable 5 (#1 on Arena creative writing and LiveBench Language)

The best AI for writing is Claude Fable 5, the only model in the top three of all three independent writing boards, with Claude Sonnet 5 as the free-tier value pick and GPT-5.5 as the alternative for fact-anchored business writing. Fable 5 leads Arena’s creative-writing leaderboard at 1508 Elo, tops LiveBench Language at 90.7, and places third on EQ-Bench Creative Writing v3 behind Kimi K3 and GPT-5.6 Sol. No other model is top-three on more than one of them.

This is a change from last month, when Claude Sonnet 5 held this slot on the strength of its GDPval-AA score. The preference boards do not support that placing. Sonnet 5 sits #53 on Arena creative writing and #13 on EQ-Bench, and scores 75.0 on LiveBench Language against Fable 5’s 90.7. It is still the model most people should write with day to day, because it is free and default on claude.ai, but it is the value pick rather than the quality leader.

Fable 5 costs $10 / $50 per 1M tokens and is permanently included in Claude Max and Team Premium at roughly 50% of regular usage limits, with Pro and Team Standard reaching it through usage credits. If your writing is a work deliverable rather than prose, Claude Opus 5 tops Artificial Analysis’s GDPval-AA v2 professional-deliverables board outright at 1861, well clear of Fable 5’s 1747. Arena and EQ-Bench have not rated Opus 5 yet, so we have kept it out of the crown for now.

Model	Best For	Strength	Weakness	Price (per 1M tokens)
Claude Fable 5	Best writing overall	#1 Arena creative writing (1508), #1 LiveBench Language (90.7), #3 EQ-Bench	Priciest option here	$10 / $50
Kimi K3	Creative fiction and voice	#1 EQ-Bench Creative Writing (2377), 234 Elo clear of second	Only #10 on Arena creative writing	$3 / $15
Claude Sonnet 5	Free everyday writing	Free and default on claude.ai, 1M context	#53 Arena creative writing; 75.0 LiveBench Language	$2 / $10 intro (then $3 / $15)
Claude Opus 5	Professional deliverables	#1 GDPval-AA v2 at 1861, ahead of Fable 5 (1747)	Not yet rated on Arena or EQ-Bench	$5 / $25
GPT-5.5	Fact-anchored business writing	Documented factual-reliability gains over GPT-5.4	Artificial Analysis now marks its reasoning tiers deprecated	$5 / $30
Gemini 3.6 Flash	Bulk drafts at scale	17% fewer output tokens than 3.5 Flash	Weaker on hardest reasoning	$1.50 / $7.50

Runner-up and alternatives: Kimi K3 is the runner-up for creative fiction and wins EQ-Bench outright, Claude Sonnet 5 is the runner-up on value and the one to use if you are not paying, Claude Opus 5 is the pick for professional deliverables, and Gemini 3.6 Flash is the pick for bulk drafting.

What changed this month: the writing crown moved from Claude Sonnet 5 to Claude Fable 5. We widened the evidence beyond Artificial Analysis to the three boards that actually measure writing, and Fable 5 is top-three on all of them while Sonnet 5 is top-three on none. Kimi K3 (July 16) arrived as the EQ-Bench leader, and Claude Opus 5 (July 24) took GDPval-AA v2 at 1861, though the preference boards have not rated it yet.

Best AI for Chat & Daily Assistant

Best AI for Chat & Daily Assistant: GPT-5.6 (ChatGPT’s default since July 9)

The best AI for everyday chat is GPT-5.6, and the honest reason is reach rather than board position. It is the model ChatGPT serves by default to the largest user base in the category, which makes it the best assistant most people can actually open. On raw human preference it is not the leader: GPT-5.6 Sol sits #11 on Arena’s text leaderboard at 1485, where Claude Fable 5 leads at 1507. If you want the best-rated conversational model and are willing to leave ChatGPT, that is the swap to make.

Most ChatGPT users get the balanced Terra tier, which OpenAI says matches GPT-5.5 at roughly half the cost. It is available inside ChatGPT (free with limits, Plus at $20/month, Pro at $100/month for roughly 5x Plus usage or $200/month for roughly 20x), through the API (Luna $1 / $6, Terra $2.50 / $15, Sol $5 / $30 per 1M tokens), and bundled inside Fello AI alongside Claude, Gemini, Grok, and DeepSeek. One caveat: OpenAI’s system card and the evaluator METR flagged elevated “scheming” behaviour in Sol.

GPT-5.5 is still sold by OpenAI at $5 / $30 and its Instant tier is still the safer pick for hallucination-sensitive work, with a documented 52.5% drop in hallucinated claims over GPT-5.3 Instant. Note that Artificial Analysis has since marked every GPT-5.5 reasoning tier deprecated, so treat it as a model you can still buy rather than a current benchmark reference. Claude Opus 5 is the better pick when you want a model that pushes back on weak prompts, and Gemini 3.6 Flash is the better pick if you are running everything through the free Gemini app.

Model	Best For	Strength	Weakness	Price
GPT-5.6	Everyday chat, ChatGPT’s default	The assistant most people can open; Terra matches GPT-5.5 at ~half cost	#11 on Arena text; scheming flagged by METR	Free / $20/mo Plus; API $1 / $6 to $5 / $30
Claude Fable 5	Highest-rated conversation	#1 on Arena text overall (1507) and 6 of 7 subcategories	No free tier; usage-credit access on Pro	$10 / $50 API
Claude Opus 5	Thoughtful, nuanced answers	#1 Artificial Analysis Intelligence Index (61) and Agentic Index (55.3)	Not yet rated on Arena	$20/mo Pro, $5 / $25 API
GPT-5.5 Instant	Hallucination-sensitive daily work	52.5% fewer hallucinated claims vs 5.3 Instant	Reasoning tiers now marked deprecated by Artificial Analysis	$20/mo Plus; API $5 / $30
Gemini 3.6 Flash	Fast, free, multimodal	Free in the Gemini app, 1M context, #12 on Arena text	Weaker on hardest reasoning	Free / $1.50 / $7.50 API
Fello AI	All the top models, one app	ChatGPT + Claude + Gemini + Grok + DeepSeek and more on Mac, iPhone and iPad	Routed via app, not direct	$9.99/mo

Runner-up and alternatives: Claude Fable 5 is the runner-up and the actual preference leader, Claude Opus 5 is the runner-up for thoughtful daily use, Gemini 3.6 Flash is the runner-up for fast and free, and Grok 4.5 is the niche pick for live-news days. Fello AI is the natural pick if you want the top models in one Mac and iOS app for $9.99/month instead of juggling subscriptions.

What changed this month: we kept GPT-5.6 as the chat pick but changed the justification. Adding the human-preference boards showed it at #11 on Arena text rather than at the top, so the page now says plainly that this crown is about reach and default availability, not measured quality. Claude Opus 5 (July 24) replaces Opus 4.8 as the Anthropic pick at the same $5 / $25, and GPT-5.5’s “proven fallback” framing has been softened now that Artificial Analysis lists its reasoning tiers as deprecated.

Best AI for Images

Best AI for Images: ChatGPT Images 2.0 (#1 on text-to-image and image editing)

The best AI for image generation is ChatGPT Images 2.0, and it is the least controversial crown on this page. GPT Image 2 leads Arena’s text-to-image board at 1385 Elo and its image-editing board at 1465, and Artificial Analysis puts it first on its own image arena at 1337.7. It is the natural pick whenever your image needs to contain readable words, in English or in another script, and it is included in ChatGPT Plus and Pro.

The runner-ups have changed. Reve 2.1 (July 9) is the real #2 on text-to-image at 1302, and Reve 2.0 now sits behind it on both boards. Meta’s Muse Image is #3 on text-to-image and #2 on image editing, the strongest showing any Meta image model has managed. Google’s Nano Banana Pro is no longer the runner-up overall: on text-to-image it ranks between #8 and #11, below its own cheaper sibling Nano Banana 2, though it does place higher on image editing.

Model	Best For	Strength	Weakness	Price
ChatGPT Images 2.0	Images with readable text	#1 text-to-image (1385) and #1 image editing (1465)	Less photoreal than the Gemini image line	Included in ChatGPT Plus
Reve 2.1	Layout, typography, native 4K	#2 text-to-image at 1301, layout-preserving editing	Smaller ecosystem	Free / from $7.99/mo
Muse Image	Image editing, Meta ecosystem	#3 text-to-image, #2 image editing (1402)	New, thin tooling around it	Meta AI app
Nano Banana 2 (Gemini 3.1 Flash Image)	Photoreal portraits and products	Outranks Nano Banana Pro on both boards	Weaker on text in image	Gemini app / AI Studio
Seedream 5.0 Pro	Multilingual text + region-precise editing	10+ languages incl. Arabic RTL, lasso and layer editing	No independent benchmarks; copyright cloud	BytePlus / Magnific
Midjourney v8	Stylized art, illustration	Aesthetic baseline most artists prefer	Weaker on text in image	$10-$120/mo
Grok Imagine	NSFW / Spicy Mode	Most permissive guardrails	Smaller model behind it	$30/mo SuperGrok

Runner-up and alternatives: Reve 2.1 is the runner-up overall and the pick for layout and typography, Muse Image is the runner-up for editing an image you already have, and Nano Banana 2 is the photoreal pick. Grok Imagine is still the only frontier model that allows Spicy Mode adult content.

What changed this month: the crown is unchanged and independently confirmed, but the runner-ups were wrong and have been corrected. Reve 2.1 replaces Reve 2.0 as #2, Muse Image enters at #3 text-to-image and #2 image editing, and Nano Banana Pro has been demoted from “runner-up overall” to what the boards actually show, which is #8 to #11 and behind Nano Banana 2.

Best AI for Video

Best AI for Video: Gemini Omni Flash (#1 on both video leaderboards)

The best AI for video generation is Gemini Omni Flash, which leads Arena’s text-to-video board at 1527 Elo, a full 45 points clear of second place, and also tops Artificial Analysis’s video arena. It is #1 on both houses, which no other video model manages. Google reached the consumer launch on May 19, 2026 through the Gemini app, Flow and YouTube Shorts, and opened developer access on June 30 through AI Studio and the Gemini API. Read our full breakdown of Gemini Omni Flash.

Pricing runs $1.50 in and $17.50 per 1M video output tokens, which works out at roughly $0.10 per second of finished video, and it supports conversational editing so you can adjust a clip by describing the change. The one real limit is length: Omni Flash generates 10-second clips. If you need longer takes, Veo 3.1 remains the right tool inside the Gemini app, AI Studio and Vertex AI, with native audio and 1080p output.

This replaces Veo 3.1 at the top of the category. Veo 3.1 is a good model, but it is not the leading one: its best variant sits #6 on Arena’s text-to-video board, behind Omni Flash, ByteDance’s Dreamina Seedance 2.0 and Meta’s Muse Video. Google still wins this category, just with a different model than the page previously named.

Model	Best For	Strength	Weakness	Price
Gemini Omni Flash	Best AI video overall	#1 on both video boards (1527 Arena), conversational editing	Caps at 10-second generations	~$0.10/sec; Gemini app / AI Studio
Dreamina Seedance 2.0	Closest challenger	#2 text-to-video (1482) and #1 on image-to-video	ByteDance ecosystem, limited Western access	Dreamina / BytePlus
Muse Video	Meta ecosystem video	#3 text-to-video at 1459	Newest of the group, thin tooling	Meta AI app
Veo 3.1	Longer production clips	Native audio, 1080p, strong physics consistency	#6 on Arena video, not the quality leader	Google AI Pro / Ultra
Kling 3.0 / 3.0 Turbo	Fast iteration at lower cost	Native 4K, 60fps, 15-second clips; Turbo shipped June 17	Outside the top 16 on Arena text-to-video	From $10/mo
Luma Ray 3	Photoreal scenes	Strong realism for landscapes	Smaller community	Free / from $9.99/mo

Runner-up and alternatives: Dreamina Seedance 2.0 is the runner-up overall and actually beats Omni Flash on image-to-video, Muse Video is third, and Veo 3.1 is the pick when 10 seconds is not enough. Runway is no longer listed here: the page previously named Gen-4, which has since been superseded by Gen-4.5, and we could not reproduce a top-tier placing for either across the boards we track. OpenAI retired the Sora 2 consumer app on April 26, 2026 and only the developer API remains, through September 24, 2026.

What changed this month: the video crown moved from Veo 3.1 to Gemini Omni Flash, since Omni Flash is #1 on both independent video boards while Veo 3.1 is sixth on Arena. The runner-up list was rebuilt on the same boards. Kuaishou’s current line is Kling 3.0 (February 4, 2026, native 4K at 60fps in 15-second clips) and Kling 3.0 Turbo (June 17, 2026), and no Kling model reaches the top 16 on Arena’s text-to-video board, so Kling is listed as the fast-iteration option rather than the runner-up.

Best AI for Coding

Best AI for Coding: Claude Fable 5 (#1 on Terminal-Bench 2.1, LiveBench Coding and the Remote Labor Index)

The best AI for coding is Claude Fable 5, and it holds that position on more independent boards than any other model. It is #1 on the official Terminal-Bench 2.1 leaderboard at 83.8% running in Claude Code, #1 on LiveBench Coding at 86.0, y #1 on Scale SEAL’s Remote Labor Index at 15.8, roughly 1.9x the next model on real contracted work. It also tops SEAL’s SWE Atlas Refactoring and Test Writing boards and takes #1 on Arena’s coding subcategory.

We have changed the evidence behind this crown. The widely quoted SWE-Bench Pro figure for Fable 5 comes from a vendor scaffold rather than a public board, and the public board disagrees with it: Scale SEAL’s SWE-Bench Pro leaderboards put Meta’s Muse Spark 1.1 first on both the public and private splits and do not list Fable 5 in the top six of either. The crown survives on the boards above, so that is what we cite now.

The contender to watch is Kimi K3, which takes #1 on Arena’s WebDev board at 1682, a 52-point margin over Fable 5, and #1 on Arena’s Agent board. Claude Opus 5 is the everyday-value pick at $5 / $25, half the price of Fable 5, and Anthropic’s own docs now tell developers to start with Opus 5 for complex agentic coding and reserve Fable 5 for the highest-capability workloads. Read our cover of Claude Opus 5.

On price-per-result, Artificial Analysis’s Coding Index is not a Claude sweep and we will not pretend otherwise: GPT-5.6 Sol (xhigh) leads it at 78.3 with Opus 5 (max) at 78.0, a gap small enough to call a tie, with Fable 5 at 76.5 and Kimi K3 at 76.2. The cheapest serious contenders are Grok 4.5 at $2 / $6 and Gemini 3.6 Flash at $1.50 / $7.50. On open weights, GLM-5.2 (MIT) is the strongest available option and is #4 on Arena WebDev.

Model	Best For	Strength	Weakness	Price (per 1M tokens)
Claude Fable 5	Best coding overall, long-horizon agentic	#1 Terminal-Bench 2.1 (83.8%), #1 LiveBench Coding (86.0), #1 Remote Labor Index	Priciest; Artificial Analysis Coding Index puts it 7th	$10 / $50
Kimi K3	Web app building and agents	#1 Arena WebDev (1682) and #1 Arena Agent	Not yet on Terminal-Bench; weights not out	$3 / $15
Claude Opus 5	Everyday-value agentic coding	Coding Index 78.0 (statistical tie for #1), Anthropic’s recommended default	Not yet rated on Arena or Terminal-Bench	$5 / $25
GPT-5.6 Sol	OpenAI flagship, agentic coding	#1 Artificial Analysis Coding Index at 78.3 (xhigh)	Absent from the official Terminal-Bench board; eval-gaming flagged by METR	$5 / $30
Muse Spark 1.1	Cheap agentic tool use	#1 on SEAL SWE-Bench Pro (public and private) and MCP Atlas (88.1)	US-only preview	$1.25 / $4.25
Grok 4.5	Cheap value coder	#4 on the official Terminal-Bench 2.1 board at 79.3% via Cursor CLI	Higher hallucination rate; EU API console still closed	$2 / $6
Gemini 3.6 Flash	Agent coding at scale	Intelligence Index 50, 17% fewer output tokens than 3.5 Flash	Weaker on hardest reasoning	$1.50 / $7.50
GLM-5.2	Best open-weight coder	Highest open Intelligence Index (51), #4 Arena WebDev	Self-host or provider only	Open weights (MIT)

Runner-up and alternatives: Kimi K3 is the runner-up on Arena’s web and agent boards, Claude Opus 5 is the runner-up on value at half the price, GPT-5.6 Sol is the runner-up on Artificial Analysis’s composite, and GLM-5.2 is the open-weight pick. Inside IDEs, Cursor with Claude is still the most popular pairing and Claude Code is the natural pick if you live in the terminal.

What changed this month: the crown stayed with Claude Fable 5 but the evidence behind it was replaced. We dropped the vendor-scaffold SWE-Bench Pro claim, because Scale SEAL’s public SWE-Bench Pro boards contradict it, and rebuilt the case on Terminal-Bench 2.1, LiveBench Coding, the Remote Labor Index and SWE Atlas, where Fable 5 is first. Kimi K3 (July 16) enters as the Arena WebDev and Agent leader, and Claude Opus 5 (July 24) replaces Opus 4.8 as the everyday-value Claude at the same $5 / $25. We have also stopped presenting vendor Terminal-Bench figures as leaderboard results: the official board has Grok 4.5 at 79.3%, not the 83.3% xAI reported, and does not list GPT-5.6 Sol at all.

Best AI for Creativity

Best AI for Creativity: Grok 4.5 (fewest content restrictions, native real-time X)

The best AI for unfiltered, on-trend creative work is Grok 4.5, and we want to be exact about why. This pick is about the product, not the prose quality. Grok 4.5 carries the fewest content restrictions of any frontier model and the only native real-time X integration, which makes it the one model that will engage with edgy, topical or deliberately provocative briefs that the others decline. It is the default in the Grok app for SuperGrok and X Premium+ subscribers at $30/month.

It is not the best writer, and the boards are blunt about it. Grok 4.5 sits #33 on EQ-Bench Creative Writing y #41 on Arena’s creative-writing leaderboard, losing on both to the older Grok 4.20-beta1. If you are picking on output quality alone, Claude Fable 5 wins outright at #1 on Arena creative writing, and Kimi K3 wins EQ-Bench. Choose Grok 4.5 for what it will let you make, not for how well it writes.

Model	Best For	Strength	Weakness	Price
Grok 4.5	Unfiltered, opinionated, on-trend	Fewest content restrictions, native real-time X grounding	#33 EQ-Bench, #41 Arena creative writing	$30/mo SuperGrok
Claude Fable 5	Highest-quality creative prose	#1 Arena creative writing (1508), #1 LiveBench Language	Cautious guardrails on edgy briefs	$10 / $50 API
Kimi K3	Fiction and distinctive voice	#1 EQ-Bench Creative Writing at 2377	Only #10 on Arena creative writing	$3 / $15
Claude Opus 5	Long-form structured creativity	Holds long threads and self-edits; #1 Intelligence Index	Most cautious of the group	$20/mo Pro, $5 / $25 API
Gemini 3.1 Pro	Multimodal creative	Strong text, image and video chain	Quotas inside the Gemini app	Free / $2.00-$4.00 API in
Grok Imagine (Spicy Mode)	NSFW / adult creative	Most permissive image generation	Niche use case	$30/mo SuperGrok

Runner-up and alternatives: Claude Fable 5 is the runner-up and the right pick if quality matters more than freedom, Kimi K3 is the pick for fiction, and Claude Opus 5 is the pick for creative projects that run across many turns. For adult creative work, Grok Imagine Spicy Mode is still the only frontier-grade option.

What changed this month: Grok 4.5 keeps this pick, but the reasoning is now stated honestly. Adding the human-preference boards showed it at #33 on EQ-Bench and #41 on Arena creative writing, so this section no longer implies that it wins on quality. It is here for its permissiveness and its live X access, and the page now names Claude Fable 5 as the model to use when you want the better writing.

Best AI for Accuracy

Best AI for Accuracy: Gemini 3.1 Pro (98% on ARC-AGI-1, at $0.52 per task)

The best AI for accuracy and research is Gemini 3.1 Pro. Its strongest result is on ARC Prize’s ARC-AGI-1, where it scores 98% and ties the human panel, and it does that at $0.52 per task. That combination is the argument: several models are close on capability, none matches it on cost for reliable factual work. It pairs that with native Google Search grounding, which is what you actually want when the answer has to be current rather than merely plausible.

It also scores 94.3% on GPQA Diamond and 44.4% on Humanity’s Last Exam, and tops Scale SEAL’s HLE board at 46.44. We have dropped the page’s previous ARC-AGI-2 framing. Its 77.1% is still correct, but the board has moved and that score now places it around 14th, behind GPT-5.6 Sol at 93% and Claude Opus 5 at 90%, so it is no longer evidence of an accuracy lead.

Two honest caveats. On grounded search specifically, Arena’s search leaderboard is led by Anthropic, not Google, with Gemini 3.1 Pro grounding at #7. And on novel reasoning, GPT-5.6 Sol leads ARC-AGI-2 and Claude Opus 5 leads ARC-AGI-3 at 30%, roughly 3.75x the next-best model according to ARC Prize. We did not move the crown to Opus 5 because Artificial Analysis measures its hallucination rate at 50% and places it below Fable 5 on AA-Omniscience, which is weak ground for a crown named accuracy.

Model	Best For	Key Benchmark	Weakness	Price
Gemini 3.1 Pro	Cheap, reliable factual work	98% ARC-AGI-1 (ties human panel) at $0.52/task, 94.3% GPQA	ARC-AGI-2 77.1% now ranks ~14th; #7 on Arena search	$2.00-$4.00 / $12.00-$18.00 (tiered)
Claude Fable 5	Grounded search	#1 and #3 on Arena’s search leaderboard, ahead of Google	No single cheap tier	$10 / $50 (Fable 5)
GPT-5.6 Sol	Novel reasoning	#1 ARC-AGI-2 at 93%, against a 100% human panel	Scheming flagged by METR	$5 / $30
Claude Opus 5	Hardest unseen problems	#1 ARC-AGI-3 at 30%, ~3.75x the next model (ARC Prize)	Artificial Analysis measures a 50% hallucination rate	$5 / $25
Qwen 3.7 Max	Frontier accuracy at value pricing	92.4 GPQA Diamond, 200 free requests/day	API-only, no chat front-end	$1.25 / $3.75 promo; $2.50 / $7.50 list
Claude Opus 4.6	Honesty under pressure	#1 on Scale SEAL’s MASK board at 96.28; Anthropic holds the top 5	Superseded as a flagship	Legacy Anthropic model

Runner-up and alternatives: Anthropic’s models are the runner-up for grounded search and sweep the honesty-under-pressure board, GPT-5.6 Sol is the runner-up for novel reasoning, and Qwen 3.7 Max is the value pick at the frontier.

What changed this month: Gemini 3.1 Pro keeps the accuracy crown, but on rebased evidence. The lead argument is now 98% on ARC-AGI-1 at $0.52 per task plus Search grounding, and the old ARC-AGI-2 framing is gone because 77.1% now ranks around 14th rather than at the top. We also split out what this category was quietly doing at once, so grounded search now credits Anthropic and novel reasoning credits GPT-5.6 Sol and Claude Opus 5, rather than implying one model wins all three.

Best AI for Problem Solving

Best AI for Problem Solving: GPT-5.6 Sol (#1 on LiveBench Mathematics, Reasoning and ARC-AGI-2)

The best AI for hard problem solving is GPT-5.6 Sol, and it is the best-supported crown on this page. It takes #1 on LiveBench Mathematics at 96.2, #1 on LiveBench Reasoning at 91.7, y #1 on ARC-AGI-2 at 93%, the closest any model has come to the 100% human panel. Three separate houses put it first on the reasoning tasks that matter, which is more agreement than any other category on this page produces.

OpenAI has still not published Sol’s FrontierMath score, so the verified OpenAI mark remains GPT-5.5 Pro’s 39.6% on FrontierMath Tier 4, and we will slot Sol’s number in the moment it goes public. Qwen 3.7 Max is the value alternative for competition-style problems at 97.1 on the February 2026 HMMT index and 44.5 on Apex, at a fraction of the cost of ChatGPT Pro, and it now includes 200 free model requests per day.

Claude Opus 5 is the alternative for long agentic reasoning chains, leading Artificial Analysis’s Agentic Index at 55.3 and ARC Prize’s ARC-AGI-3 at 30%, roughly 3.75x the next-best model. It runs second to Sol on LiveBench Reasoning at 91.2. For multimodal reasoning where the problem includes diagrams or documents, Gemini 3.1 Pro is still the practical pick.

Model	Best For	Key Benchmark	Weakness	Price
GPT-5.6 Sol	Hardest math, science and reasoning	#1 LiveBench Mathematics (96.2), #1 LiveBench Reasoning (91.7), #1 ARC-AGI-2 (93%)	FrontierMath still unpublished; scheming flagged by METR	$100/mo ChatGPT Pro; API $5 / $30
Claude Opus 5	Long agentic reasoning chains	#1 Agentic Index (55.3), #1 ARC-AGI-3 (30%)	Second on LiveBench Reasoning; 50% hallucination rate	$5 / $25
GPT-5.5 Pro	Verified FrontierMath leader	39.6% FrontierMath Tier 4	Superseded by Sol as flagship	$100/mo ChatGPT Pro
Qwen 3.7 Max	Competition math on a budget	97.1 HMMT 2026 Feb, 44.5 Apex, 200 free requests/day	API-only	$1.25 / $3.75 promo; $2.50 / $7.50 list
Claude Fable 5	Math inside a coding workflow	#1 on Arena’s math subcategory (1543), 96.0 LiveBench Mathematics	Priciest option here	$10 / $50
GLM-5.2	Open-weight problem solving	Highest open Intelligence Index at 51, MIT, 1M context	Self-host or provider only	Open weights (MIT)

Runner-up and alternatives: Claude Opus 5 is the runner-up and the natural pick for long-chain agentic reasoning, Claude Fable 5 is the runner-up on Arena’s math board, Qwen 3.7 Max is the value pick, and GLM-5.2 is the open-weight pick.

What changed this month: GPT-5.6 Sol keeps this crown and the case for it got stronger, not weaker. Widening the research to LiveBench and ARC Prize showed it first on mathematics, reasoning and ARC-AGI-2, so this is now the most defensible pick on the page. Claude Opus 5 (July 24) enters as the agentic-reasoning alternative on the strength of ARC-AGI-3, and Qwen 3.7 Max picked up a free tier of 200 requests per day.

Best AI Agent

Best AI Agent: Gemini Spark vs Claude Cowork ($99.99/month Ultra vs $20/month Pro)

The best AI agent right now is Gemini Spark for 24/7 cloud-resident work and Claude Cowork for desktop-resident work, with ChatGPT Codex as the alternative for coding agents and OpenAI Operator-class browser agents as the alternative for web tasks. AI agents are the fastest-moving category of 2026: each top vendor now ships an agent product, and the practical choice is between agents that live in the cloud (run while your laptop is closed) and agents that live on your desktop (drive your apps directly).

Gemini Spark launched at Google I/O on May 19, 2026 and is the first 24/7 cloud agent. Claude Cowork launched in general availability on April 9, 2026 and runs as a desktop agent that drives your local apps. ChatGPT Codex Mobile (May 14) is the pick for coding-agent work, now usable from iOS and Android. Read the full Gemini Spark vs Claude Cowork comparison.

Agent	Best For	Where It Runs	Strength	Price
Gemini Spark	24/7 cloud tasks, Workspace workflows	Google Cloud VM (always-on)	First true 24/7 agent, deep Workspace integration	$99.99/mo Google AI Ultra
Claude Cowork	Desktop, app-driving, design + code	Your Mac/Windows desktop	Drives local apps, sees your screen	$20/mo Claude Pro
ChatGPT Codex Mobile	Coding agent on phone	OpenAI cloud + iOS/Android	Approve diffs and redirect work from phone	Included in ChatGPT plans
Grok Agentic (Grok 4.5)	Real-time research, X scraping	xAI cloud	Native X integration	$30/mo SuperGrok
OpenAI Operator-class	Browser tasks, web forms	OpenAI cloud + your browser	Web automation	ChatGPT Pro

Runner-up and alternatives: Claude Cowork is the runner-up overall and the natural pick when you want the agent on your machine driving your apps. ChatGPT Codex Mobile is the runner-up for coding agents. Grok Agentic is the niche pick for real-time research.

What changed this month: no new consumer agents shipped, so the Gemini Spark (cloud) versus Claude Cowork (desktop) choice still drives most agent decisions for individual users. The model layer underneath them moved a lot. Claude Opus 5 (July 24) took #1 on Artificial Analysis’s Agentic Index at 55.3, ahead of GPT-5.6 Sol at 54.0 and Claude Fable 5 at 52.8, and it costs $5 / $25. Kimi K3 took #1 on Arena’s Agent board, where the metric is task success rate rather than Elo, ahead of Fable 5 and Opus 4.8. For teams building their own agents, Meta’s Muse Spark 1.1 is a cheap agent-native option at $1.25 / $4.25 that leads Scale SEAL’s MCP Atlas tool-use board at 88.1, and GLM-5.2 (MIT) is the strongest open-weight agent model at #4 on Arena’s Agent board.

Pricing Comparison

AI Model Pricing Comparison in July 2026 ($0 free tiers to $199.99/month Google AI Ultra)

Here is the July 2026 pricing comparison for every leading AI model, in API cost per 1 million tokens and the consumer-subscription price for the same model. Free tiers exist for ChatGPT, Gemini, Claude, Grok, and DeepSeek. The most consequential price on this table is Claude Opus 5 at $5 / $25, because it is the #1 model on Artificial Analysis’s Intelligence Index at half the cost of Claude Fable 5. Meta’s Muse Spark 1.1 lists at $1.25 / $4.25 and Grok 4.5 at $2 / $6, and on a price-per-intelligence basis Artificial Analysis puts a Grok 4.5 index task at about $0.31, five times cheaper than Claude Sonnet 5. Among closed models Gemini 3.6 Flash at $1.50 / $7.50 stays the cheapest frontier all-rounder; the cheapest open-weight frontier coder is MiniMax M3 at around $0.60 per million input tokens, and the cheapest open-weight model with a 1M context is DeepSeek V4-Flash at $0.14 / $0.28. For a deeper breakdown by tier, see our full AI Pricing Comparison Guide hub.

Model	Input (per 1M)	Output (per 1M)	Context Window	Free access?
GPT-5.5	$5.00	$30.00	1M (400K in Codex)	ChatGPT Free; API paid
GPT-5.5 Pro	$30.00	$180.00	1M	ChatGPT Pro from $100/mo ($200 higher-usage tier)
GPT-5.6 Sol	$5.00	$30.00	not published	Live in ChatGPT, Codex & API (July 9)
GPT-5.6 Terra	$2.50	$15.00	not published	Live in ChatGPT, Codex & API (July 9)
GPT-5.6 Luna	$1.00	$6.00	not published	Live in ChatGPT, Codex & API (July 9)
Claude Opus 5	$5.00	$25.00	1M	Claude Pro/Max default; API paid
Claude Opus 4.8	$5.00	$25.00	1M	Legacy model at Anthropic; Pro/Max/API
Claude Fable 5	$10.00	$50.00	1M	Permanent in Max/Team Premium (~50% of usage limits); Pro/Team Standard via credits
Claude Sonnet 5	$2.00 intro / $3.00 list	$10.00 intro / $15.00 list	1M	Claude Free & Pro default; API paid
Claude Sonnet 4.6	$3.00	$15.00	1M	API paid (superseded by Sonnet 5)
Gemini 3.1 Pro	$2.00 (≤200K) / $4.00 (>200K)	$12.00 (≤200K) / $18.00 (>200K)	1M	Limited Gemini app; API paid
Gemini 3.6 Flash	$1.50	$7.50	1M	Gemini app/AI Studio; free API tier + paid API
Gemini 3.5 Flash-Lite	$0.30	$2.50	1M	AI Studio; free API tier + paid API
Qwen 3.7 Max	$1.25 promo / $2.50 list	$3.75 promo / $7.50 list	1M	200 free requests/day; API paid beyond that
MiniMax M3	~$0.60	~$2.40 (≤512K)	1M	Open weights; hosting costs apply
LongCat-2.0	Provider-dependent	Provider-dependent	1M	Open weights (MIT); hosting costs apply
NVIDIA Nemotron 3 Ultra	Provider-dependent	Provider-dependent	1M	Open weights (OpenMDW); hosting costs apply
Qwen 3.5 (open-weight)	Self-host / Together	Self-host / Together	1M	Open weights; hosting costs apply
Nex-N2-Pro	Self-host / providers	Self-host / providers	1M	Open weights (Apache 2.0); hosting costs apply
Rio 3.5 Open 397B	Self-host / providers	Self-host / providers	1M	Open weights (MIT); hosting costs apply
Grok 4.3	$1.25	$2.50	1M	Free consumer plan; API paid
Grok 4.5	$2.00	$6.00	500K	Grok Build / Cursor / xAI console; EU partial, API console still closed
Muse Spark 1.1	$1.25	$4.25	1M	Meta Model API ($20 free credits, US preview); free in Meta AI Thinking mode
Kimi K3	$3.00	$15.00	1M	Free basic tier in the Kimi app; open weights due July 27
Gemini Omni Flash (video)	$1.50	$17.50 (video output)	10-second clips	Gemini app / Flow; AI Studio + API
DeepSeek V4-Pro	$0.435 ($0.0036 cache-hit)	$0.87	1M	DeepSeek Chat free; API paid
DeepSeek V4-Flash	$0.14	$0.28	1M	DeepSeek Chat free; API paid
Kimi K2.7 Code	Provider-dependent	Provider-dependent	256K	Open weights; hosting costs apply
GLM-5.2	Provider-dependent	Provider-dependent	1M	Open weights; hosting costs apply
ERNIE 5.1	China-region pricing	China-region pricing	256K	Baidu free tier
Gemini Spark (agent)	Not API-priced	Not API-priced	1M (Gemini base)	Google AI Ultra $99.99 or $199.99/mo
Fello AI (aggregator)	Routed via app	Routed via app	Model-dependent	$9.99/mo

The GPT-5.5 and GPT-5.5 Pro rates above are short-context prices. OpenAI labels those rows “(<272K context length)” and bills longer prompts at a higher tier, but it no longer publishes the specific long-context figures, so we have stopped quoting them. The GPT-5.6 tiers carry no such split.

If you want access to multiple AI models without managing separate subscriptions, Fello AI provides GPT, Claude, Gemini, Grok, Perplexity, and more in a single app for Mac, iPhone, and iPad, starting at $9.99/month with a free tier available. Models are updated regularly so you always have access to the latest.

Claude vs ChatGPT AI comparison cover for 2026, showing Anthropic Claude and OpenAI logos on an orange-to-green gradient background with soft light streaks and headline text.

Claude vs ChatGPT: Which AI Is Actually Better in 2026?

Claude hit #1 on the App Store free chart on February 28, 2026, pushing ChatGPT out of the top spot for the first time. The catalyst was Anthropic refusing the Pentagon’s demand to strip the guardrails off Claude for autonomous weapons and mass surveillance. The administration then told federal agencies

Leer Más "

ChatGPT vs Grok comparison cover for 2026, featuring OpenAI and Grok logos on a dark teal gradient background with glowing light waves and the title “Who Wins in 2026?”

Grok vs ChatGPT: Which AI Chatbot Is Actually Better in 2026?

Update, July 10, 2026: Both chatbots just moved to new flagship models. ChatGPT now runs GPT-5.6 (the Sol, Terra, and Luna tiers), which began its broad public rollout on July 9, 2026, and Grok is powered by Grok 4.5, xAI’s coding-focused release from July 8, 2026. The pricing, benchmarks, and

Leer Más "

Gemini vs ChatGPT comparison cover for 2026, featuring Google Gemini and OpenAI logos on a purple-to-green gradient background with smooth abstract light waves and bold title text.

ChatGPT vs Gemini in 2026: Which AI Should You Actually Use?

Update, July 10, 2026: ChatGPT moved to a new flagship. GPT-5.6 began its broad public rollout on July 9, 2026 as a three-tier family, Sol (flagship), Terra (balanced), and Luna (fast and cheapest). On the Google side, Gemini 3.1 Pro remains the paid flagship on Google AI Pro and Gemini

Leer Más "

Futuristic blue-purple light tunnel with five AI model logos and the headline “The Best AI In February 2026?”

Best AI February 2026 Rankings: GPT-5.2, Claude Opus 4.6, and Gemini 3.1 Pro

Choosing the right AI tool in 2026 feels like trying to hit a moving target. New models arrive every few weeks, and what worked best in January might already be outdated today. This guide cuts through the hype to show you exactly which tools are winning right now based on

Leer Más "

A graphic with a digital circuit board background. Text at the top reads, "JAN 2026". Three humanoid figures, colored blue/red, green, and orange, are breaking a large golden crown into four pieces. Text bubbles identify them as "Gemini 3 Pro," "GPT-5.2," and "Claude Opus 4.5." The crown pieces are labeled "PREFERENCE #1," "REASONING #1" (twice), and "CODING #1." Large text at the bottom says, "THE AI THRONE HAS FRACTURED. JANUARY 2026 RANKINGS: New Data Changes Everything."

Best AI Models In January 2026: Gemini 3, Claude 4.5, ChatGPT (GPT-5.2), Grok 4.1 & Deepseek

TL;DR: In January 2026, there isn’t one “best” AI for everything. On LMArena’s Text leaderboard, Gemini 3 Pro leads user-preference rankings, while the updated Artificial Analysis Intelligence Index v4.0 reports GPT-5.2 (with extended reasoning) as the top overall benchmark performer. Choose based on your task: Gemini for daily assistance, Claude

Leer Más "

Comic-style comparison image showing GPT-Image-1.5 vs Nano Banana-Pro, split by a lightning bolt with a bold VS in the center and the headline “Ultimate Comparison.”

Gemini Nano Banana Pro vs GPT-Image-1.5: Ultimate Comparison

Update, July 2026: both models here have since been succeeded. Google has shipped Nano Banana 2 and Nano Banana 2 Pro, and OpenAI’s GPT Image line now powers ChatGPT Images 2.0. The head-to-head below is our original December 2025 test of GPT-Image-1.5 vs Nano Banana Pro, with the hands-on images

Leer Más "

Task	Best Model	Why	Free?	Alternative
Essays & coursework	GPT-5.5	Free in ChatGPT, improved factual reliability vs 5.4	Yes	Claude Sonnet 5 (free Claude)
STEM problem-solving	GPT-5.6 Sol / Qwen 3.7 Max	New STEM flagship (5.5 Pro: 39.6% FrontierMath) / 97.1 HMMT 2026 Feb	Pro paid / Qwen API paid	Gemini 3.6 Flash (free)
Research & accuracy	Gemini 3.1 Pro	98% ARC-AGI-1 at $0.52/task, native Google Search grounding	Yes (Gemini app)	Claude Opus 5
Writing editing	Claude Sonnet 5	Free and default on claude.ai; Claude Fable 5 is the quality leader	Yes (Claude free)	Claude Fable 5
Multimodal study (PDFs, slides, images)	Gemini 3.6 Flash	1M context, free in Gemini app	Yes	NotebookLM (Google)

Model	Best For	Key Benchmark	Context / License	Where To Run
LongCat-2.0	Frontier open coder trained on Chinese chips	II 33 (v4.1); 59.5% SWE-Bench Pro (vendor), 1.6T/~48B active	1M / MIT	Hugging Face, GitHub, OpenRouter
MiniMax M3	Cheap frontier-class multimodal	II 44 (v4.1), 59% SWE-Bench Pro, multimodal	1M / license TBD	Hugging Face, API ~$0.60/1M
Nex-N2-Pro	Strongest open coding score	II 41 (v4.1); 80.8 SWE-Bench Verified, 397B/17B active	Qwen-based / Apache 2.0	Hugging Face, providers, self-host
Kimi K2.7 Code	Strongest commercially-licensed open coder	+21.8% on Kimi Code Bench v2 vs K2.6 (vendor); 1T/32B active	256K / Modified MIT	Hugging Face, DeepInfra, providers
DeepSeek V4-Pro	Agentic real-world work	II 44 (v4.1), 1.6T/49B active	1M / MIT	DeepSeek API ($0.435/$0.87), local
GLM-5.2	Long-horizon agentic coding, 1M context	II 51 (v4.1), highest of any open model; 744B/40B active	1M / MIT	Z.ai, Hugging Face, OpenRouter
Hy3	Newest permissive-licence entrant	II 41 (v4.1); #19 on Arena WebDev (1516.7)	Apache 2.0	Hugging Face, providers, self-host
Inkling	Thinking Machines’ first open model	II 41 (v4.1), agentic 32.3, released July 15	Open weights	Hugging Face, providers, self-host
NVIDIA Nemotron 3 Ultra	NVIDIA-tuned, fully permissive license	II 38 (v4.1), 65-70.4 SWE-Bench Verified, 550B/55B active	1M / OpenMDW	OpenRouter, Hugging Face, AWS (8× B200 self-host)
DeepSeek V4-Flash	Cheapest 1M-context open model	II 40 (v4.1), $0.14/$0.28 per 1M, 284B/13B active	1M / MIT	DeepSeek API, local
Qwen 3.5 (397B / 17B active)	Multimodal, fast decode	88.4 GPQA, 91.3 AIME 2026, 83.6 LiveCodeBench v6	1M / open	Together, OpenRouter, local
Qwen3.6-35B-A3B	Efficient open agentic coder (3B active)	86.0 GPQA Diamond, 92.7 AIME 2026, 35B/3B active	262K (→1M YaRN) / Apache 2.0	Hugging Face, OpenRouter, local
Qwen3.6-27B	Laptop-runnable dense coder	87.8 GPQA Diamond, dense 27B, multimodal	256K / Apache 2.0	Local Mac/PC, Hugging Face, OpenRouter
Rio 3.5 Open 397B	Qwen 3.5 fine-tune, multilingual reasoning	70.8 Terminal-Bench 2.1 (first-party), beats Qwen 3.7 Plus on 4/5	397B / 17B active, MIT	Hugging Face, providers, self-host
Qwen 3.5-9B	Laptop-runnable open-weight	81.7 GPQA Diamond	Dense / open	Local Mac/PC with 16GB+ RAM
Llama 4 Maverick	Meta-line flagship	17B active / 400B total params	Llama 4 license	Meta cloud, Hugging Face, local
NVIDIA Nemotron 3 Nano Omni	Edge / low-power	Multimodal, very small footprint	Compact / open	Local, NVIDIA tool

The Best AI to Use In July 2026

Monthly Ranking of Top AI Models

Claude Fable 5

Best AI for Writing

ChatGPT-5.6

Best AI for Chat / Daily Assistant

ChatGPT Images 2.0

Best AI for Images

Gemini Omni Flash

Best AI for Video

Claude Fable 5

Best AI for Coding

Grok 4.5

Best AI for Creativity

Gemini 3.1 Pro

Best AI for Accuracy

ChatGPT-5.6

Best AI for Problem Solving

What is new in July 2026

Claude Opus 5 – Anthropic – July 24, 2026 – new #1 on Artificial Analysis, tops both the Intelligence Index (61) and the Agentic Index (55.3)

Fugu-Ultra v1.1 – Sakana AI – July 24, 2026 – orchestration-engine refresh with vendor-reported gains of up to 7.9 points over v1.0 at the same price

Gemini 3.6 Flash – Google – July 21, 2026 – cheaper output, faster, same Intelligence Index

Qwen 3.8 (Qwen3.8-Max) – Alibaba – July 19, 2026 – 2.4-trillion-parameter multimodal flagship, previewed at WAIC Shanghai

Kimi K3 – Moonshot AI – July 16, 2026 – 2.8T reported params, largest open-weight model from China (weights due July 27)

GPT-5.6 Sol, Terra, and Luna – OpenAI – July 9, 2026 – next-gen family live across ChatGPT, Codex, and the API

Muse Spark 1.1 – Meta – July 9, 2026 – Meta’s first paid model, a cheap agentic coder at $1.25 / $4.25

Grok 4.5 – xAI (SpaceX AI division) – July 8, 2026 – cheap Cursor-trained coding model at $2 / $6, independently ranked #4

Seedream 5.0 Pro – ByteDance – July 8, 2026 – multilingual text-and-layout image model with region-precise editing

Claude Fable 5 – Anthropic – Returned July 1, 2026 – Mythos-class flagship back online after export controls lifted

Claude Sonnet 5 – Anthropic – June 30, 2026 – new default model, takes the writing crown and closes the gap to Opus 4.8

LongCat-2.0 – Meituan – June 29, 2026 – 1.6T open-weight coder trained entirely on Chinese chips

Gemini 3.5 Pro – Google – Still unreleased – months behind schedule, per Bloomberg

Category Deep Dives

Best AI for Writing

Best AI for Writing: Claude Fable 5 (#1 on Arena creative writing and LiveBench Language)

Best AI for Chat & Daily Assistant

Best AI for Chat & Daily Assistant: GPT-5.6 (ChatGPT’s default since July 9)

Best AI for Images

Best AI for Images: ChatGPT Images 2.0 (#1 on text-to-image and image editing)

Best AI for Video

Best AI for Video: Gemini Omni Flash (#1 on both video leaderboards)

Best AI for Coding

Best AI for Coding: Claude Fable 5 (#1 on Terminal-Bench 2.1, LiveBench Coding and the Remote Labor Index)

Best AI for Creativity

Best AI for Creativity: Grok 4.5 (fewest content restrictions, native real-time X)

Best AI for Accuracy

Best AI for Accuracy: Gemini 3.1 Pro (98% on ARC-AGI-1, at $0.52 per task)

Best AI for Problem Solving

Best AI for Problem Solving: GPT-5.6 Sol (#1 on LiveBench Mathematics, Reasoning and ARC-AGI-2)

Best AI Agent

Best AI Agent: Gemini Spark vs Claude Cowork ($99.99/month Ultra vs $20/month Pro)

Pricing Comparison

AI Model Pricing Comparison in July 2026 ($0 free tiers to $199.99/month Google AI Ultra)

Claude vs ChatGPT: Which AI Is Actually Better in 2026?

Grok vs ChatGPT: Which AI Chatbot Is Actually Better in 2026?

ChatGPT vs Gemini in 2026: Which AI Should You Actually Use?

Best AI February 2026 Rankings: GPT-5.2, Claude Opus 4.6, and Gemini 3.1 Pro

Best AI Models In January 2026: Gemini 3, Claude 4.5, ChatGPT (GPT-5.2), Grok 4.1 & Deepseek

Gemini Nano Banana Pro vs GPT-Image-1.5: Ultimate Comparison

Best AI for Students & Studying

Best AI for Students & Studying: GPT-5.5 Free + Gemini 3.6 Flash Free (zero-cost frontier for coursework)

Best AI for Work & Professionals

Best AI for Work: GPT-5.6 + Claude Opus 5 ($20/month each, plus Gemini Spark for agents)

Open-Weight and Free Models

Best Open-Weight Models in July 2026: GLM-5.2 leads, with Hy3 and Inkling new on the board

How We Evaluate

Benchmarks, Prices, and Hands-On Use

FAQ

What is the best AI model right now in July 2026?

What is new in AI in July 2026?

What is Claude Opus 5?

Is Claude Fable 5 back?

What is Claude Sonnet 5?

What is GPT-5.6 and can I use it?

Is Grok 4.5 out yet?

What is the best open-weight AI model in 2026?

What is Qwen 3.7 Max and how does it compare to GPT-5.5?

What is GPT-5.5 and how is it different from GPT-5.4?

Is ChatGPT still the best AI?

What is Gemini Spark and is it worth $99.99/month?

Download Fello AI,
the all-in-one AI App