Best AI for Writing
Best AI for Writing: Claude Sonnet 4.6 ($3 / $15 per 1M tokens, 1,643 Elo GDPval-AA)
The best AI for writing is Claude Sonnet 4.6, with GPT-5.5 as the alternative for structured business writing and Claude Opus 4.8 as the alternative for long-form work where every sentence matters. Sonnet 4.6 leads on writing style, voice fidelity, and instruction-following, scoring 1,643 Elo on GDPval-AA and sitting at the top of Anthropic’s lineup for natural prose. GPT-5.5 (April 23, 2026) lowered hallucinations by 60% versus GPT-5.4 and now leads GDPval-AA overall, which makes it the safer default for fact-anchored writing like reports and briefs. Gemini 3.5 Flash (May 19) hit 1,656 GDPval-AA Elo, edging Sonnet 4.6 at lower cost, and is the price-performance pick for bulk content work. Claude Opus 4.8 is the call when you want chain-of-thought editing, long-form revision, or you want the model to push back on weak arguments.
|
Model
|
Best For
|
Strength
|
Weakness
|
Price (per 1M tokens)
|
|
Claude Sonnet 4.6
|
Style, voice, instruction-following
|
Top of natural-prose Elo within Anthropic line
|
More cautious than GPT on opinions
|
$3 / $15
|
|
GPT-5.5
|
Business writing, factual reports
|
60% fewer hallucinations vs 5.4
|
Style less expressive than Sonnet
|
$5 / $30
|
|
Gemini 3.5 Flash
|
Bulk content, drafts at scale
|
1,656 GDPval-AA Elo, 40% cheaper than Pro
|
Weaker on hardest reasoning
|
$1.50 / $9.00
|
|
Claude Opus 4.8
|
Long-form, high-stakes copy
|
Best editor for argument structure
|
Most expensive option here
|
$15 / $75
|
|
Grok 4.3
|
Casual, opinionated, X-style
|
Native X grounding, fewer guardrails
|
Not the natural pick for formal copy
|
$3 / $15
|
Runner-up and alternatives: Gemini 3.5 Flash is the runner-up for sheer volume at near-Sonnet quality, and GPT-5.5 is the runner-up for factual accuracy. Claude Opus 4.8 is the splurge pick for long-form. Grok 4.3 is the niche pick when you want X-style voice or live web context inside the draft.
What changed this month: Gemini 3.5 Flash (May 19) hit GDPval-AA 1,656 Elo, just above Claude Sonnet 4.6 at 1,643 and just below GPT-5.4 at 1,671, at $1.50 / $9.00 per 1M tokens ($30+ cheaper than GPT-5.5 on output). GPT-5.5 (April 23) still leads GDPval-AA overall and stays the default for structured knowledge work after dropping hallucinations by 60% versus 5.4. Sonnet 4.6 still leads on style.
Best AI for Chat / Daily Assistant
Best AI for Chat & Daily Assistant: GPT-5.5 ($20/month ChatGPT Plus, 60% fewer hallucinations)
The best AI for everyday chat and daily assistant work is GPT-5.5, with Claude Opus 4.8 as the alternative when you want a more thoughtful tone and Gemini 3.5 Flash as the budget alternative inside the free Gemini app. GPT-5.5 launched April 23, 2026 with a 60% drop in hallucinations versus GPT-5.4, faster response times across all tiers, and a refreshed memory system that makes it the most reliable default for general-purpose tasks. It is available inside ChatGPT (free, Plus at $20/month, Pro at $100/month or $200/month for the larger context tier), through the API at $5 / $30 per 1M tokens, and bundled inside Fello AI alongside Claude, Gemini, Grok, and DeepSeek. Claude Opus 4.8 is the better pick when you want a model that pushes back on weak prompts and reasons more carefully through ambiguous questions; Gemini 3.5 Flash is the better pick when you are running everything through the free Gemini app or care about speed.
|
Model
|
Best For
|
Strength
|
Weakness
|
Price
|
|
GPT-5.5
|
Everyday chat, default assistant
|
60% fewer hallucinations vs 5.4
|
Less expressive than Sonnet 4.6
|
$20/mo Plus, $5 / $30 API
|
|
Claude Opus 4.8
|
Thoughtful, nuanced answers
|
Strong reasoning, pushes back well
|
$75 output API is expensive
|
$20/mo Pro, $15 / $75 API
|
|
Gemini 3.5 Flash
|
Fast, free, multimodal
|
Free in Gemini app, 1M context
|
Weaker on hardest reasoning
|
Free / $1.50 input / $9.00 output per 1M API
|
|
Grok 4.3
|
Live news, X integration
|
Real-time X & web grounding
|
Smaller ecosystem
|
$30/mo SuperGrok
|
|
Fello AI
|
All five models, one app
|
$9.99/mo for ChatGPT + Claude + Gemini + Grok + DeepSeek
|
Routed via app, not direct
|
$9.99/mo
|
Runner-up and alternatives: Claude Opus 4.8 is the runner-up for thoughtful daily use, Gemini 3.5 Flash is the runner-up for fast/free, and Grok 4.3 is the niche pick for live-news heavy days. Fello AI is the natural pick if you want all five top models in one Mac/iOS app for $9.99/month instead of juggling subscriptions.
What changed this month: GPT-5.5 stayed the default for chat after April’s launch, with no May regressions. Gemini 3.5 Flash (May 19) made the free Gemini app meaningfully faster and now matches Sonnet 4.6 on GDPval-AA at zero cost in the consumer app. Claude Opus 4.8 continues to hold the top LM Arena text slot around 1,502 Elo as votes accumulate.
Best AI for Images
Best AI for Images: ChatGPT Images 2.0 (included in ChatGPT Plus, leader on readable text)
The best AI for image generation is ChatGPT Images 2.0, with Google Nano Banana Pro (Gemini 3.5 image stack) as the alternative for photorealism and Midjourney v8 as the alternative for stylized art. ChatGPT Images 2.0 (April 21, 2026) leads on text rendering, multilingual scripts, and infographic-style output, which makes it the natural pick when your image needs to contain words. Google’s Nano Banana Pro stack (refreshed at I/O 2026 alongside Gemini 3.5 Flash) is the natural pick for photoreal portraits and product shots at Flash-tier API cost ($1.50 / $9.00 per 1M tokens for the model layer). Midjourney v8 stays the niche choice for distinctive style. Microsoft’s MAI-Image-2 (April 2) remains too new to rank.
|
Model
|
Best For
|
Strength
|
Weakness
|
Price
|
|
ChatGPT Images 2.0
|
Images with readable text
|
Best multilingual text rendering
|
Less photoreal than Nano Banana
|
Included in ChatGPT Plus
|
|
Nano Banana Pro (Gemini 3.5)
|
Photoreal portraits, products
|
Photorealism, $0.04 per image cap
|
Style less distinctive
|
Gemini Pro or AI Studio
|
|
Midjourney v8
|
Stylized art, illustration
|
Aesthetic baseline most artists like
|
Weaker on text in image
|
$10-$60/mo
|
|
Grok Imagine
|
NSFW / Spicy Mode
|
Most permissive guardrails
|
Smallest model behind
|
$30/mo SuperGrok
|
|
MAI-Image-2
|
Microsoft ecosystem
|
Native in Copilot
|
Too new to rank
|
Included in Copilot
|
Runner-up and alternatives: Nano Banana Pro is the runner-up overall and the leader for photoreal work; Midjourney v8 is the niche pick for art-direction-heavy use. Grok Imagine is the only major model that allows Spicy Mode adult content.
What changed this month: Gemini 3.5 Flash (May 19) refreshed the Nano Banana Pro image stack with the same Nano-Banana-class quality at Flash speeds and 40% lower API cost. ChatGPT Images 2.0 still leads on text-in-image. MAI-Image-2 remains too new to rank.
Best AI for Video
Best AI for Video: Google Veo 3.1 (Gemini App / AI Studio, Sora 2 discontinued April 26, 2026)
The best AI for video generation is Google Veo 3.1, with Kling 3.5 as the alternative for fast iteration and Runway Gen-4 as the alternative for cinematic motion control. Sora 2 was officially discontinued by OpenAI on April 26, 2026, so OpenAI no longer ranks in this category. Veo 3.1 is available inside the Gemini app, Google AI Studio, and via Vertex AI, with native audio generation, 1080p output, and the strongest physics consistency in the current lineup. Kling 3.5 stays the speed pick at lower cost; Runway Gen-4 is the choice when you need precise camera control. Pika 2.0 and Luma Ray 3 remain credible alternatives for shorter clips.
|
Model
|
Best For
|
Strength
|
Weakness
|
Price
|
|
Google Veo 3.1
|
Highest-fidelity AI video + audio
|
1080p, native audio, physics consistency
|
Compute-heavy, slower
|
Gemini AI Pro / Ultra
|
|
Kling 3.5
|
Fast iteration
|
Quick turnaround, strong motion
|
Less stable on long shots
|
From $10/mo
|
|
Runway Gen-4
|
Cinematic control
|
Best-in-class camera/motion control
|
Pricing premium
|
From $15/mo
|
|
Pika 2.0
|
Short clips, social
|
Cheap, fast, easy UX
|
Lower max resolution
|
From $10/mo
|
|
Luma Ray 3
|
Photoreal scenes
|
Strong realism for landscapes
|
Smaller community
|
From $15/mo
|
Runner-up and alternatives: Kling 3.5 is the runner-up overall and the cost-conscious pick; Runway Gen-4 is the runner-up for filmmakers and ad teams. Sora 2 is officially gone.
What changed this month: Sora 2 officially ended on April 26, 2026 after OpenAI deprioritised video to focus on Codex and Personal Finance. Veo 3.1 is now uncontested at the top of the still-supported video models. Google is widely expected to refresh Veo at the next Google AI event; we will update this section when that happens.
Best AI for Coding
Best AI for Coding: Claude Opus 4.8 vs GPT-5.5 ($15 / $75 vs $5 / $30 per 1M tokens)
The best AI for coding is Claude Opus 4.8, with GPT-5.5 as the proprietary alternative, Gemini 3.5 Flash as the price-performance pick for agent-style coding, Qwen 3.7 Max as the new mid-tier value pick, and DeepSeek V4 as the open-weight pick. Claude Opus 4.8 holds Anthropic’s top SWE-bench Verified score and remains the favourite inside Claude Code and Cursor. GPT-5.5 (April 23) is right behind at 88.7% on SWE-bench Verified and ahead on FrontierMath. Gemini 3.5 Flash (May 19) hit 76.2% on Terminal-Bench 2.1 and 83.6% on MCP Atlas at $1.50 / $9.00 per 1M tokens, making it the strongest price-performance option for agent workflows. Qwen 3.7 Max (May 20) hit 80.4 on SWE-Verified at $2.50 / $7.50 and $0.25 cached input, undercutting both Opus 4.8 and GPT-5.5 on cost. DeepSeek V4 Preview (April 24) remains the strongest open-weight model at 80% plus SWE-bench and about 90% HumanEval, available locally on Mac with enough RAM.
|
Model
|
Best For
|
Strength
|
Weakness
|
Price (per 1M tokens)
|
|
Claude Opus 4.8
|
Long-running agentic coding
|
Anthropic-leading SWE-bench, task budgets
|
Most expensive
|
$5 / $25
|
|
GPT-5.5
|
Frontier proprietary alternative
|
88.7% SWE-bench Verified
|
Less agent-tuned than Opus
|
$5 / $30
|
|
Gemini 3.5 Flash
|
Agent coding at scale
|
76.2% Terminal-Bench, 83.6% MCP Atlas
|
Weaker on hardest reasoning
|
$1.50 / $9.00
|
|
Qwen 3.7 Max
|
Cost-effective mid-tier
|
80.4 SWE-Verified, $0.25 cached input
|
Closed weights, API-only
|
$2.50 / $7.50
|
|
DeepSeek V4 Preview
|
Open-weight, local runs
|
80%+ SWE-bench, ~90% HumanEval
|
Hardware-heavy for local
|
$0.27 / $1.10
|
Runner-up and alternatives: GPT-5.5 is the proprietary runner-up; Gemini 3.5 Flash is the runner-up for price-performance; Qwen 3.7 Max is the runner-up for mid-tier value; DeepSeek V4 is the runner-up for open-weight self-hosters. Inside IDEs, Cursor + Claude Opus 4.8 is the most popular pairing and Claude Code is the natural pick if you live in the terminal.
What changed this month: Gemini 3.5 Flash (May 19) made agent coding meaningfully cheaper at the frontier. Qwen 3.7 Max (May 20) joined the top tier with 80.4 SWE-Verified, undercutting Claude Opus 4.8 and GPT-5.5 on price-per-quality. DeepSeek V4 Preview (April 24) stays the strongest open-weight option. The April-launched proprietary leaders (GPT-5.5 at 88.7% SWE-bench, Claude Opus 4.8 with Anthropic’s gains over 4.8) remain the picks when budget is not the constraint.
Best AI for Creativity
Best AI for Creativity: Grok 4.3 (xAI, $30/month SuperGrok, fewer guardrails)
The best AI for creative writing, brainstorming, and unfiltered ideation is Grok 4.3, with Claude Opus 4.8 as the alternative for structured creative work and Gemini 3.1 Pro as the alternative for multimodal creative tasks. Grok 4.3 (April 30, 2026) has the most permissive guardrails of any frontier model and the strongest native X integration, which makes it the natural pick for opinionated, on-trend, real-time creative work. Claude Opus 4.8 is the better pick when you want a model that holds a long creative thread, edits its own drafts, and engages with the substance of your work. Gemini 3.1 Pro is the better pick when your creative project mixes text with images, video, and live web context.
|
Model
|
Best For
|
Strength
|
Weakness
|
Price
|
|
Grok 4.3
|
Unfiltered, opinionated, on-trend
|
Fewest guardrails, X integration
|
Less polished for structured work
|
$30/mo SuperGrok
|
|
Claude Opus 4.8
|
Long-form structured creativity
|
Holds long threads, self-edits
|
Most cautious of the four
|
$20/mo Pro, $5 / $25 API
|
|
Gemini 3.1 Pro
|
Multimodal creative
|
Strong text + image + video chain
|
Quotas inside Gemini app
|
Free / $2.00-$4.00 API in
|
|
ChatGPT-5.5
|
Mainstream creative writing
|
Best at hitting briefs
|
Heavier guardrails
|
$20/mo Plus, $5 / $30 API
|
|
Grok Imagine (Spicy Mode)
|
NSFW / adult creative
|
Most permissive image generation
|
Niche use case
|
$30/mo SuperGrok
|
Runner-up and alternatives: Claude Opus 4.8 is the runner-up overall and the right pick for projects that need to hold together across many turns. Gemini 3.1 Pro is the multimodal runner-up. For adult creative work, Grok Imagine Spicy Mode is the only frontier-grade option.
What changed this month: No major creativity-specific launches in May 2026. Grok 4.3 stayed the category leader after April. Gemini 3.5 Flash (May 19) is too speed-tuned to be the natural creativity pick yet, but the cheaper image stack inside Gemini 3.5 helps multimodal creative workflows.
Best AI for Accuracy
Best AI for Accuracy: Gemini 3.1 Pro (94.3% GPQA Diamond, 44.4% Humanity’s Last Exam, 77.1% ARC-AGI-2)
The best AI for accuracy and research is Gemini 3.1 Pro, with Qwen 3.7 Max as the new value alternative and GPT-5.5 Pro as the alternative for hallucination-sensitive work. Gemini 3.1 Pro leads the hardest pure-reasoning tests at 94.3% on GPQA Diamond, 44.4% on Humanity’s Last Exam, and 77.1% on ARC-AGI-2, with native Google Search grounding for live factual answers. Qwen 3.7 Max (May 20) entered the top tier at 92.4 on GPQA Diamond, tied with Claude Opus 4.8, at half the API cost. GPT-5.5 Pro (April 23) keeps the 60% hallucination drop over GPT-5.4, which makes it the right pick when factual reliability matters more than raw benchmark depth. Gemini 3.5 Flash (May 19) outscores Gemini 3.1 Pro on coding and agent benchmarks but trails Pro on these accuracy tests (HLE 40.2% vs 44.4%, ARC-AGI-2 72.1% vs 77.1%), so Pro stays the accuracy pick.
|
Model
|
Best For
|
Key Benchmark
|
Weakness
|
Price
|
|
Gemini 3.1 Pro
|
Hardest reasoning + research
|
94.3% GPQA, 44.4% HLE, 77.1% ARC-AGI-2
|
API quotas in app
|
$2.00-$4.00 / $12.00-$18.00 (tiered)
|
|
Qwen 3.7 Max
|
Frontier accuracy at value pricing
|
92.4 GPQA Diamond
|
API-only, no chat front-end
|
$2.50 / $7.50
|
|
GPT-5.5 Pro
|
Hallucination-sensitive work
|
60% fewer hallucinations vs 5.4
|
Pricier API tier
|
$100/mo ChatGPT Pro
|
|
Claude Opus 4.8
|
Long-form factual writing
|
Top LM Arena text slot ~1,502 Elo
|
Slower on hardest math
|
$5 / $25
|
|
Grok 4.3
|
Live web facts
|
Native real-time grounding
|
Smaller benchmark coverage
|
$30/mo SuperGrok
|
Runner-up and alternatives: Qwen 3.7 Max is the new runner-up after May 20 and the value pick at the frontier. GPT-5.5 Pro is the runner-up for hallucination-sensitive work. Claude Opus 4.8 is the runner-up for long-form factual writing.
What changed this month: Qwen 3.7 Max (May 20) joined the accuracy top tier at 92.4 GPQA Diamond. Gemini 3.5 Flash (May 19) did NOT overtake Pro on accuracy tests, so the live-page recommendation does not change at the top.
Best AI for Problem Solving
Best AI for Problem Solving: GPT-5.5 Pro & Qwen 3.7 Max (39.6% FrontierMath Tier 4, 97.1 HMMT 2026 Feb)
The best AI for hard problem-solving is GPT-5.5 Pro for FrontierMath-style abstract math and Qwen 3.7 Max for competition math, with Claude Opus 4.8 Thinking as the alternative for long agentic reasoning chains. GPT-5.5 Pro still leads at 39.6% on FrontierMath Tier 4 (nearly double Claude Opus 4.8’s 22.9%), which makes it the right pick when you need step-by-step working on the hardest math and physics problems. Qwen 3.7 Max (May 20) hit 97.1 on HMMT 2026 February, the highest score in its comparison group, and 44.5 on Apex, which makes it the right pick for competition-style problem-solving at half the cost of GPT-5.5 Pro. Claude Opus 4.8 Thinking (April 16) introduced task budgets, a new primitive for controlling agentic token spend on long chains. Gemini 3.5 Flash trades raw reasoning depth for speed and price; for the hardest problems, Gemini 3.1 Pro and the Thinking variants still lead.
|
Model
|
Best For
|
Key Benchmark
|
Weakness
|
Price
|
|
GPT-5.5 Pro
|
Abstract math, physics
|
39.6% FrontierMath Tier 4
|
Highest cost tier
|
$100/mo ChatGPT Pro
|
|
Qwen 3.7 Max
|
Competition math
|
97.1 HMMT 2026 Feb, 44.5 Apex
|
API-only
|
$2.50 / $7.50
|
|
Claude Opus 4.8 Thinking
|
Long agentic reasoning
|
Task budgets, top LM Arena text
|
Slower on math
|
$5 / $15
|
|
Gemini 3.1 Pro
|
Multimodal reasoning + research
|
94.3 GPQA, 77.1 ARC-AGI-2
|
API quotas
|
$2.00-$4.00 / $12.00-$18.00 (tiered)
|
|
DeepSeek V4 Preview
|
Open-weight problem solving
|
Strong on AIME/HumanEval
|
Hardware-heavy local
|
$0.27 / $1.10
|
Runner-up and alternatives: Claude Opus 4.8 Thinking is the runner-up overall and the natural pick for agentic, long-chain problem-solving. Gemini 3.1 Pro is the multimodal runner-up. DeepSeek V4 Preview is the open-weight runner-up.
What changed this month: Qwen 3.7 Max (May 20) joined the front of the pack at 97.1 HMMT 2026 February and 44.5 Apex. GPT-5.5 Pro still leads FrontierMath Tier 4 at 39.6%. Reasoning depth remains Pro/Thinking territory; Flash-class models have not displaced it.
Best AI Agents
Best AI Agent: Gemini Spark vs Claude Cowork ($100/month Ultra vs $20/month Pro)
The best AI agent right now is Gemini Spark for 24/7 cloud-resident work and Claude Cowork for desktop-resident work, with ChatGPT Codex as the alternative for coding agents and OpenAI Operator-class browser agents as the alternative for web tasks. AI agents are the fastest-moving category of 2026: each top vendor now ships an agent product, and the practical choice is between agents that live in the cloud (run while your laptop is closed) and agents that live on your desktop (drive your apps directly). Gemini Spark launched at Google I/O on May 19, 2026 and is the first 24/7 cloud agent. Claude Cowork launched in general availability on April 9, 2026 and runs as a desktop agent that drives your local apps. ChatGPT Codex Mobile (May 14) is the picks for coding-agent work, now usable from iOS and Android. Read the full Gemini Spark vs Claude Cowork comparison.
|
Agent
|
Best For
|
Where It Runs
|
Strength
|
Price
|
|
Gemini Spark
|
24/7 cloud tasks, Workspace workflows
|
Google Cloud VM (always-on)
|
First true 24/7 agent, deep Workspace integration
|
$100/mo Google AI Ultra
|
|
Claude Cowork
|
Desktop, app-driving, design + code
|
Your Mac/Windows desktop
|
Drives local apps, sees your screen
|
$20/mo Claude Pro
|
|
ChatGPT Codex Mobile
|
Coding agent on phone
|
OpenAI cloud + iOS/Android
|
Approve diffs and redirect work from phone
|
Included in ChatGPT plans
|
|
Grok Agentic (Grok 4.3)
|
Real-time research, X scraping
|
xAI cloud
|
Native X integration
|
$30/mo SuperGrok
|
|
OpenAI Operator-class
|
Browser tasks, web forms
|
OpenAI cloud + your browser
|
Web automation
|
ChatGPT Pro
|
Runner-up and alternatives: Claude Cowork is the runner-up overall and the natural pick when you want the agent on your machine driving your apps. ChatGPT Codex Mobile is the runner-up for coding agents. Grok Agentic is the niche pick for real-time research.
What changed this month: Gemini Spark (May 19) is the launch that defines this category. Codex Mobile (May 14) made OpenAI’s coding agent phone-friendly. Claude Cowork stayed the desktop-agent default after its April GA, and the practical Spark-vs-Cowork choice now drives most agent decisions for individual users.