Best AI Models in May 2026: ChatGPT, Claude, Gemini & Grok

The Best AI to Use In May 2026

Compare leading AI models & Understand which is the best model for your needs. [Updated 10th of May]

May 2026 starts on the back of the busiest model month of the year. OpenAI shipped GPT-5.5 on April 23 and brought it to AWS Bedrock on April 28 alongside GPT-Rosalind and Codex. NVIDIA released Nemotron 3 Nano Omni, a 30B open multimodal model that runs vision, audio, and text in one stack and tops six leaderboards. DeepSeek V4 Pro and V4 Flash are out in preview, and Sora 2 has been discontinued, with the API following in September.

Earlier in the month, Anthropic shipped Claude Opus 4.7, Google launched a native Gemini Mac app, Microsoft released its first in-house foundation models (MAI-Transcribe-1, MAI-Voice-1, MAI-Image-2), Google open-sourced Gemma 4 under Apache 2.0, Alibaba shipped Qwen 3.6-Plus, and Meta released Muse Spark. OpenAI also completed the full retirement of GPT-4o from ChatGPT on April 3.

The top models are closer together than ever, which makes picking the right one for your specific task more important, not less. GPT-5.5 is the new default for daily chat and knowledge work. Claude Opus 4.7 is our editorial pick for coding and agentic tasks. Gemini 3.1 Pro still leads on accuracy and reasoning benchmarks. Grok 4.3 brings real-time data and multi-agent depth at lower cost. Below, we break down which model wins each category, why, and when you should consider the alternatives.

Monthly Ranking of Top AI Models

AI models change fast. New versions are released, performance shifts, and strengths evolve over time. To keep this comparison accurate and up to date, we publish a Best AI of the Month analysis every month, based on the latest model updates and real-world performance. Below are our most recent monthly rankings, where we take a deeper look at how the leading AI models performed during each month.

Claude Sonnet 4.6

Best AI for Writing

Claude Sonnet 4.6 still leads on style, voice fidelity, and instruction-following, the qualities that matter most for writers. GPT-5.5 took the top of the GDPval leaderboard on April 23 with a 60% drop in hallucinations, but Sonnet remains the model writers actually reach for when tone and voice consistency matter.

ChatGPT-5.5

Best AI for Chat / Daily Assistant

GPT-5.5 is OpenAI’s new frontier model and the strongest ChatGPT version to date. It hallucinates 60% less often than GPT-5.4, scores 92.4% on MMLU, and brings stronger computer-use and agentic coding via Codex. It is rolling out to Plus, Pro, Business, and Enterprise.

ChatGPT Images 2.0

Best AI for Images

ChatGPT Images 2.0 (gpt-image-2) is the new benchmark, launched April 21. It is the first image model that renders readable typography inside dense layouts like infographics, menus, and diagrams, supports 2K output, and handles non-Latin scripts. Gemini 3.1 Flash Image (Nano Banana 2) is still the speed and cost leader.

Veo 3.1

Best AI for Video

Google’s Veo 3.1 is our editorial pick for cinematic video: 24fps output, native audio, Scene Extension for 60+ second narratives, and Ingredients to Video for consistent characters across scenes. On Artificial Analysis’s text-to-video leaderboards, other models currently rank higher on raw preference, so treat this as an editorial call on production quality.

Claude Opus 4.7

Best AI for Coding

Claude Opus 4.7 outperforms Opus 4.6 across industry coding benchmarks including SWE-bench Verified, SWE-bench Pro, and agentic computer use. It leads on complex, multi-file engineering tasks and supports parallel sub-agent coordination through Claude Code with task budgets. GPT-5.5 is neck-and-neck at 88.7% SWE-bench.

Grok 4.3

Best AI for Creativity

Grok 4.3 generates downloadable PDFs, spreadsheets, and PowerPoint decks directly from chat, so pitch decks and briefs come out formatted instead of as raw Markdown. xAI also shipped Custom Voices, a voice-cloning suite, and a 25-language Speech-to-Text API with multi-speaker diarization. Output runs at 207 tokens/sec, fast enough for live creative iteration. Available via API and on x.com via SuperGrok / X Premium+; SuperGrok Heavy ($300/mo) unlocks the heaviest agentic limits.

Gemini 3.1 Pro

Best AI for Accuracy

Gemini 3.1 Pro at 94.3% on GPQA Diamond a 77.1% on ARC-AGI-2, with native Google Search grounding for live factual answers. Available on Mac through the free native Gemini app, with model access subject to Google’s plan limits and routing.

Claude Opus 4.7 Thinking

Best AI for Problem Solving

Claude Opus 4.7 Thinking extends Anthropic’s chain-of-thought approach onto the Opus 4.7 base, with task budgets to control agentic token spend. GPT-5.5 Pro is the new challenger, scoring 39.6% on FrontierMath Tier 4, nearly double Opus 4.7’s 22.9%, especially strong on hard math and physics.

What is new in May 2026

ERNIE 5.1 – Baidu – May 8, 2026

Baidu released ERNIE 5.1 on May 8, debuting at #4 globally on LMArena Search Arena and topping #1 in its parameter class. ERNIE 5.1 is not a from-scratch model — it is the best-performing sub-network extracted from Baidu’s elastic super-net, optimizing for parameter efficiency rather than raw size. Baidu’s AI Assistant crossed 200 million monthly active users in the same launch window. Read the full breakdown in our ERNIE 5.1 review.

SubQ – Subquadratic – May 5, 2026

Miami startup Subquadratic came out of stealth on May 5 with $29M in seed funding a SubQ, the first LLM with a 12-million-token context window. SubQ uses a new attention mechanism called Subquadratic Sparse Attention (SSA) that breaks the quadratic-cost ceiling of traditional transformers. Founded by CEO Justin Dangel and CTO Alex Whedon (former Head of Generative AI at Meta). Full review and benchmark deep-dive: SubQ LLM review.

Grok 4.3 – xAI – April 30, 2026

xAI’s new flagship Grok 4.3 hit general availability (GA) with a 40% input price cut to $1.25 / $2.50 per 1M tokens, a 1M token context windowa native video input for the first time. Grok 4.3 also generates downloadable PDFs, spreadsheets, and PowerPoint decks directly from chat. Alongside the model launch, xAI shipped Custom Voices for voice cloning, a 25-language Speech-to-Text API with multi-speaker diarization, and a Text-to-Speech API at $4.20 per 1M characters. Output runs at 207 tokens/sec.

NVIDIA Nemotron 3 Nano Omni (NVIDIA) – April 28, 2026

NVIDIA released Nemotron 3 Nano Omni, a 30B-parameter open multimodal model that activates only 3B parameters per token. It uses a hybrid Mamba-Transformer Mixture-of-Experts backbone and handles vision, video, audio, and text in a single architecture, no stitched pipeline. It tops six leaderboards including MMLongBench-Doc, OCRBenchV2, WorldSense, and VoiceBench, and runs at up to 9x higher throughput than other open omni models. It needs only 25 GB of RAM and is available free on Hugging Face, OpenRouter, and build.nvidia.com as an NVIDIA NIM microservice. On Clarifai’s reasoning engine it sustains 400+ tokens per second.

GPT-5.5 + GPT-Rosalind on AWS Bedrock (OpenAI x AWS) – April 28, 2026

OpenAI brought its frontier stack to Amazon Bedrock. Customers can now build with GPT-5.5, GPT-5.5 Pro, and GPT-Rosalind inside AWS, alongside Codex and the new Bedrock Managed Agents powered by OpenAI. This is a distribution milestone, not a new model, but it removes the biggest blocker for AWS-native enterprises that wanted GPT-5.5 without leaving their existing security and billing perimeter. If you are an individual user, using a ChatGPT desktop client for your Mac is still the fastest way to access the model locally.

Sora 2 Discontinued (OpenAI) – April 26, 2026

OpenAI shut down the Sora web and app experiences on April 26. The Sora API stays online until September 24, 2026, after which it is also retired. OpenAI cited the cost of running the model and a sharp drop in worldwide downloads, from 3.3M in November 2025 to 1.1M in February 2026. If you used Sora, Veo 3.1, Kling 3.0, and Seedance 2.0 are the obvious replacements.

DeepSeek V4 Preview – DeepSeek – April 24, 2026

DeepSeek released V4 Preview on April 24, ending months of “imminent launch” reporting with a full drop on Hugging Face and the DeepSeek API. Two variants: V4 Pro at 1.6T total parameters / 49B active (MoE, 1M context, Apache 2.0) and V4 Flash at 284B / 13B active (MoE, 1M context). Leaked benchmarks show 90% HumanEval a 80%+ on SWE-bench Verified, matching Claude Opus 4.6 and sitting marginally below GPT-5.4 and Gemini 3.1 Pro on reasoning. API pricing is brutal: Flash at $0.14 / $0.28 per 1M tokens, Pro at $1.74 / $3.48 per 1M tokens, making Pro the cheapest large frontier model by a wide margin. V4 is now the strongest open-weight model overall and the most cost-effective path to frontier-adjacent performance.

GPT-5.5 – OpenAI – April 23, 2026

OpenAI released GPT-5.5, its new frontier coding and reasoning model, less than two months after GPT-5.4. It scores 88.7% on SWE-bench a 92.4% on MMLU, with a 60% drop in hallucinations versus GPT-5.4. Gains are strongest in agentic coding, computer use, knowledge work, and early scientific research. GPT-5.5 is rolling out to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex. GPT-5.5 Pro is live for Pro, Business, and Enterprise users, scoring 39.6% on FrontierMath Tier 4 (nearly double Claude Opus 4.7’s 22.9%). API pricing is $5 per 1M input tokens and $30 per 1M output tokens, with a 1M-token context window.

ChatGPT Images 2.0 – OpenAI – April 21, 2026

OpenAI shipped ChatGPT Images 2.0 (API model name gpt-image-2), the long-awaited successor to GPT Image 1.5. The headline feature is readable typography, which has been the single hardest capability in image generation. Images 2.0 renders legible text in dense layouts like menus, scientific diagrams, and infographic posters. It handles non-Latin scripts (Japanese, Korean, Chinese, Hindi, Bengali), supports 2K resolution, aspect ratios from 3:1 to 1:3, and generates up to 8 coherent images from a single prompt with character continuity. The standard version is free for all ChatGPT, Codex, and API users; thinking mode (web search, multi-image generation, self-verification) is paid only.