Apple has not officially announced the M5 Ultra Mac Studio, but credible leaks point to a reveal at WWDC 2026 (June 8-12), with shipping expected in summer 2026. The most recent analyst reports from Bloomberg’s Mark Gurman warn that RAM shortages could push the launch to October 2026, so both timelines are live. Either way, this is the most powerful Mac Apple has ever built, and the leaked specs are brutal.
Most coverage stops at release dates and core counts. The more interesting question is what 80 GPU cores, 256GB of unified memory, and Apple’s new GPU Neural Accelerators actually let you do that you couldn’t do before. This article pulls together every confirmed spec, every credible leak, and, for the first time in the SERP, quantifies what the M5 Ultra Mac Studio means for running local AI models on your desk.
The Key Takeaways
- The M5 Ultra Mac Studio is expected to debut at WWDC 2026 (June 8-12), with a risk of slipping to October 2026 due to RAM shortages
- Leaked specs: up to 32-36 CPU cores, 80 GPU cores, 256GB unified memory, ~1100 GB/s memory bandwidth, roughly 190W peak
- The current Mac Studio tops out at the M3 Ultra (up to 32 CPU, 80 GPU, 256GB unified memory, 819 GB/s). There was never an M4 Ultra, Apple skipped it
- Apple’s own MLX research shows the base M5 is up to 4x faster on time-to-first-token vs M4 for local LLMs, and runs a 14B dense model in under 10 seconds
- macOS 27 drops Intel Mac support entirely, so only Apple Silicon Macs will run this generation of AI features
When Is the M5 Ultra Mac Studio Coming Out?
Apple has not confirmed a date. Three timelines are credible right now. The best case is a reveal at WWDC 2026, which Apple has officially confirmed for June 8-12, 2026, with the keynote on June 8, per the Apple Newsroom announcement. A summer shipping date would follow.
The middle case is a late-summer ship tied to a hardware-only press release after WWDC. Apple has used this pattern before with the Mac Pro and Mac Studio, where the keynote stays focused on software and the Mac Studio refresh gets its own moment.
The bear case is October 2026. On April 19, 2026, MacRumors reported that both the next MacBook Pro and Mac Studio are likely postponed to around October 2026 because of a global RAM shortage and component price spikes. The same analyst reports note that base storage may jump from 512GB to 1TB to absorb some of the cost pressure. If you’re budgeting for a Mac Studio purchase this year, plan for either window.
Timeline of the leaks so far
The rumor picture has been tightening for months. In November 2025, MacRumors first surfaced references to Mac Studio models with M5 Max and M5 Ultra chips in leaked Apple files. In March 2026, Apple shipped the M5 Max in the new MacBook Pro, which locked in the per-core architecture that the Ultra will inherit. And in early April 2026, Bloomberg’s Mark Gurman flagged the RAM shortage and pushed his estimate to late summer or fall.
On April 17, 2026, MacRumors published a recap that landed on five “things to know” about the next Mac Studio; two days later came the reporting that suggests the ship date slips to October. None of this is officially confirmed, but the sourcing has converged on the same basic picture from independent leakers.
Leaked M5 Ultra Specs
The consensus across Macworld, MacRumors, and Bloomberg leaks is surprisingly tight. The M5 Ultra chip is expected to combine two M5 Max dies using Apple’s UltraFusion interconnect, the same trick Apple has used on every Ultra generation since the M1. That gives you a doubling of CPU cores, GPU cores, and memory bandwidth compared to the M5 Max that already shipped in the new MacBook Pro in March 2026.
Table of Specs
| Chip | CPU cores | GPU cores | Unified memory (max) | Memory bandwidth | AI accelerators | Status |
|---|---|---|---|---|---|---|
| M3 Ultra (Mac Studio 2025) | up to 32 | up to 80 | 256GB | 819 GB/s | 32-core Neural Engine | Shipping |
| M5 Max (MacBook Pro 2026) | up to 16 | up to 40 | 128GB | 614 GB/s | GPU Neural Accelerators in every core | Shipping |
| M5 Ultra (Mac Studio) | 32-36 | 80 | 256GB | ~1100 GB/s | GPU Neural Accelerators in every core | Leaked, unreleased |
A few things stand out. The M3 Ultra originally offered 512GB of unified memory, but Apple quietly removed that option in March 2026 because of the same global DRAM shortage now threatening the M5 Ultra launch, per a MacRumors report. Both the current M3 Ultra and the leaked M5 Ultra top out at 256GB today. The M5 Ultra’s lead over the M3 Ultra comes from the new GPU Neural Accelerators, a dedicated matrix-multiply unit inside every GPU core, and significantly higher bandwidth. For most AI workloads, bandwidth and on-chip compute matter more than raw memory capacity anyway.
Power draw is expected to land around 190 watts at peak, which is still a fraction of what a comparable Nvidia workstation GPU pulls before you add the CPU and the rest of the system. The M5 Ultra remains a single fanless-adjacent box on your desk.
How UltraFusion Works and Why It Matters for Local AI
The term “UltraFusion” gets thrown around without much context. Apple’s original M1 Ultra UltraFusion architecture used a silicon interposer to connect two M1 Max dies across more than 10,000 signals, delivering 2.5TB/s of inter-processor bandwidth. The M5 Ultra is expected to use the same approach with two M5 Max dies. That is the key engineering trick that makes an Ultra chip different from two discrete GPUs sitting on a motherboard.
The payoff for AI work is huge. Software sees the M5 Ultra as one chip with one unified memory pool, not two chips that have to shuffle weights across a slow PCIe bus. For a 70B-parameter model that spans tens of gigabytes, that difference is the difference between loading the model once and running it, versus streaming weights back and forth every inference step.
Apple’s unified memory architecture also removes the classic Nvidia-style split between CPU RAM and GPU VRAM. On a Windows workstation with an RTX 5090, you have 32GB of VRAM separate from your system RAM, and any model that doesn’t fit in 32GB has to be split, offloaded, or aggressively quantized. On an M5 Ultra with 256GB of unified memory, the CPU and GPU draw from the same pool at the same 1100 GB/s. That is why Apple Silicon over-indexes on large-model inference despite having fewer raw teraflops than high-end Nvidia cards.
Ports, Displays, and the Rest of the Box
The current M3 Ultra Mac Studio ships with six Thunderbolt 5 ports that each hit 120Gb/s, two USB-A ports, an SDXC card slot, a 10 Gigabit Ethernet jack, and an HDMI 2.1 port. Every leak points to the M5 Ultra keeping the same physical chassis and port layout, so treat those as baseline for the 2026 refresh.
Display support is serious. The M3 Ultra Mac Studio drives up to eight external displays at 6K/60Hz, or four displays at 8K/60Hz. That ceiling is not likely to change with the M5 Ultra; if anything, higher memory bandwidth and more GPU cores should extend high-resolution performance headroom. For AI workflows specifically, that means you can have Ollama, LM Studio, a browser with model docs, and two Claude or ChatGPT windows open across multiple 5K displays without skipping a beat.
What 80 GPU Cores and Neural Accelerators Mean for Local AI
This is where the M5 Ultra changes the conversation. Apple’s Machine Learning Research team published benchmark data showing that the base M5 is up to 4x faster on time-to-first-token than the M4 for local LLM inference using MLX. The same research shows the base M5 running a dense 14B model with TTFT under 10 seconds, and a 30B mixture-of-experts model with TTFT under 3 seconds. Sustained generation improved 19-27% over the M4, driven largely by the jump to 153 GB/s memory bandwidth on the base chip.
Now scale that up. The M5 Ultra has roughly 7x the memory bandwidth of the base M5 and roughly 8x the GPU cores, each with a neural accelerator. That is a different class of machine.
What you can actually run
Real benchmarks from the current M5 Max give us a floor to extrapolate from. On an M5 Max with 128GB of memory, Llama 3.1 70B at 4-bit quantization runs at roughly 22 tokens per second, Qwen-27B at ~25 tok/s, and 14B-class models blow past 55 tok/s, per third-party tests. The M5 Ultra should hit meaningfully higher numbers on the same models thanks to more GPU cores and nearly 80% more bandwidth than the Max.
Here is what that looks like for common local AI models, with rough M5 Ultra projections:
| Model | Size (Q4) | Fits in 256GB | Expected throughput |
|---|---|---|---|
| Llama 3.3 70B (Q4) | ~42GB | Yes, huge headroom | 30-45 tok/s |
| Qwen 3 72B (Q4) | ~44GB | Yes | 28-40 tok/s |
| Mistral Small 4 (FP16) | ~46GB | Yes | 60-80 tok/s |
| Gemma 4 27B (FP16) | ~54GB | Yes | 40-55 tok/s |
| GLM-5.1 (Q4) | ~55GB | Yes | 30-45 tok/s |
| DeepSeek V3.2 671B (Q4 MoE) | ~350GB | No, exceeds 256GB | N/A at full precision |
| DeepSeek V3.2 (Q2 MoE) | ~180GB | Yes, tight | 15-25 tok/s |
What the numbers don’t show is the practical feel. Being able to run a full 70B model without streaming it from the cloud means your prompts do not leave the device, latency is measured in milliseconds, and you never hit an API rate limit. Batched inference is what closes the gap between a tinkerer setup and something you can actually build products on. For indie developers and small teams, the M5 Ultra is the first Apple box where batched local LLM serving becomes viable without buying a datacenter GPU.
If you want a practical walk-through of the current chips, we cover running open-source models on M5 Macs in detail.
The Software Stack: MLX, Ollama, and LM Studio
Hardware is only half the story. The software ecosystem on Apple Silicon caught up dramatically in the last six months. Three tools matter for local AI on an M5 Ultra.
MLX is Apple’s own machine-learning framework, built to exploit unified memory and the Neural Accelerators in every GPU core. It is the fastest option on Apple Silicon and consistently runs 20-87% faster than llama.cpp for models under 14B parameters. MLX is the right choice for developers who want to build on top of Apple Silicon and tune performance.
Ollama is the easiest tool for spinning up a local model. On March 31, 2026, Ollama adopted MLX as its backend on Apple Silicon, which gave existing Ollama users an automatic ~90% speedup on many models. You install Ollama, pull a model with one command, and you have a local API running. For most readers, Ollama is the right starting point.
LM Studio is the most beginner-friendly option. It’s a GUI app that lets you download, run, and chat with models without touching the command line. It supports MLX too, and it runs on the same chips. For people who just want to try local AI without learning a CLI, LM Studio is the fastest path.
If you are choosing between them, start with LM Studio if you have never run a local model, move to Ollama once you want to hit your model from a script, and pick MLX directly if you are building a product.
M5 Ultra vs RTX 5090 for Local AI
This is the comparison most prospective buyers actually want. A maxed-out Mac Studio is not cheap, but neither is a proper workstation PC with an Nvidia RTX 5090. Here is how they actually stack up for running large local models.
The RTX 5090 has a list price of $1,999 and real-world street prices between $2,500 and $3,800 for board-partner cards. It ships with 32GB of GDDR7 VRAM at roughly 1800 GB/s bandwidth, pulls up to 575 watts under load, and needs a CPU, motherboard, PSU, case, and cooling to actually run. A real 5090 workstation build lands at $4,000 to $6,000 all-in.
The M5 Ultra Mac Studio is expected to start around $4,299 to $4,499 and climb past $10,000 fully loaded, but you get the full system in one box at ~190W peak draw, plus macOS, plus the MLX stack out of the box.
On raw speed the RTX 5090 still wins for models that fit in VRAM. On Llama 3.1 70B at 4-bit quantization, a dual-5090 setup hits roughly 100 tokens per second. An M5 Max on the same model sits around 22 tok/s, and the M5 Ultra should land somewhere in the 30-45 tok/s range. If you only care about maximum throughput on one specific model that fits in 32GB of VRAM, Nvidia is faster.
Memory is the Biggest Difference
The catch is memory. Llama 3.3 70B at Q4 needs about 42GB, which does not fit on a single RTX 5090 at all. You either run two 5090s in SLI-style configurations, offload layers to CPU RAM (which tanks speed), or quantize more aggressively (which tanks quality). An M5 Ultra with 256GB of unified memory loads the whole model without compromise, and you still have room to run another model at the same time. For any model larger than 32B or any workflow that needs multiple models loaded simultaneously, Apple Silicon wins on capability even when Nvidia wins on raw speed.
The honest rule of thumb: RTX 5090 for the fastest tokens per second on mid-size models; M5 Ultra for running the biggest models, multiple models, or agentic workflows on a single machine. Both make sense depending on what you’re building.
M5 Ultra vs M3 Ultra vs M5 Max: Who Should Wait
The buying decision depends on what you actually do with a Mac Studio.
If you need a Mac Studio immediately for very large uncompressed models or extreme 3D scenes, the M3 Ultra at 256GB is the best you can buy today, and it ships now. The M5 Ultra should match that memory ceiling and deliver meaningfully more bandwidth and AI compute when it arrives.
If you care about AI inference speed per dollar, wait for the M5 Ultra. The jump from an older Neural Engine to per-GPU-core Neural Accelerators is the biggest Apple Silicon AI upgrade since the M1. The bandwidth lift from 819 GB/s to roughly 1100 GB/s is another ~35% on top.
If your workload fits in 128GB of memory and you want something portable, the M5 Max MacBook Pro already shipped in March 2026. It has the same per-core AI architecture as the Ultra will. You are leaving Ultra-tier bandwidth on the table, but most users never saturate it. Our best MacBook for AI guide breaks down who that fits.
If you want a cheap entry point into local AI, the current M4 Mac mini is still a capable starter machine, and Apple is expected to refresh it with M5 and M5 Pro chips around WWDC 2026. Either way, a Mac mini costs a fraction of a Mac Studio. See our Mac mini for AI breakdown.
Price Expectations
No leak gives a hard price yet. For context, the current M3 Ultra Mac Studio starts at $3,999 with 28 CPU cores, 60-core GPU, 96GB of unified memory, and 1TB of storage. A maxed-out build with 32-core CPU, 80-core GPU, 256GB memory, and 16TB storage climbs well past $10,000. Macworld and TechRepublic both flag that RAM and storage cost pressure could push the M5 Ultra starting price higher.
A reasonable expectation is an M5 Ultra Mac Studio starting at $4,299 to $4,499 for a baseline configuration with 64-96GB of memory, climbing past $10,000 fully loaded at 256GB. This is speculation until Apple announces.
Three configurations worth considering
For most buyers, three configurations hit the sweet spot. The entry Ultra with roughly 96GB of memory and 1TB of storage is the right pick if you want Ultra-tier bandwidth and GPU headroom without paying for memory you will not use. The AI-optimized Ultra with 128GB of memory and 2TB of storage is the right pick for developers running 70B models and batched inference, with room to load multiple quantized models simultaneously. The maxed Ultra with 256GB of memory and 4TB+ of storage is for teams running mixture-of-experts models at scale or building AI-native products locally. Wait for Apple’s pricing before pulling the trigger on any of these.
What We Still Don’t Know
A lot of the picture is still blurry. Nobody has a credible official benchmark for the M5 Ultra yet, so every tokens-per-second number in this article is either an extrapolation from M5 Max results or sourced from analyst projections. Apple’s final pricing has not leaked, and the DRAM shortage could either push prices higher than the $4,299-$4,499 estimate or force Apple to cut memory options to hold the price line. The UltraFusion interconnect is assumed to carry over from prior generations but Apple has not formally confirmed that for M5. Whether the 512GB memory option ever returns for a future revision if DRAM eases is also open.
The one area with no useful leaks at all is AI-specific features beyond raw hardware. Whether Apple ships an MLX 2.0 with Ultra-specific optimizations, whether Ollama or LM Studio will get first-party Apple tooling support, and whether Apple’s rumored Siri chatbot in iOS 27 will run partly on-device are all unknowns. WWDC 2026 should answer most of these.
macOS 27 and Why Apple Silicon Is the Only Game Now
The M5 Ultra Mac Studio will ship with macOS 27, the first macOS release to require Apple Silicon exclusively. At WWDC 2025, Apple confirmed that macOS Tahoe 26 is the last version to support Intel Macs. From macOS 27 onward, the entire Mac AI feature set, including Apple Intelligence 2.0, the rumored Siri chatbot, Visual Intelligence upgrades, and the MLX acceleration stack, is Apple-Silicon-only.
If you are still on a 2019 MacBook Pro or a 2020 iMac, you are at the end of the road. Migrating to Apple Silicon is now mandatory if you want to run current AI features, not a nice-to-have. The M5 Ultra Mac Studio is the high end of that path, but any modern M-series Mac gets you into the tent.
Local Models on M5 Ultra, Cloud Frontier Models via Fello AI
The M5 Ultra Mac Studio is going to be the best piece of local AI hardware you can buy for general-purpose work. It handles big open-source models, long contexts, and batched inference. What it cannot do is magically make Anthropic’s flagship Claude model, OpenAI’s newest ChatGPT, or Google’s Gemini run on your desk. Those are closed-weights models that only their providers host.
This is where Fello AI comes in. Fello AI runs natively on Apple Silicon and gives you Claude, ChatGPT, Gemini, Grok, DeepSeek, and Perplexity in a single menu-bar app for $9.99/month. One subscription, every major frontier model, routed through to the provider. The pairing is clean: open-source models run locally on your M5 Ultra via MLX and Ollama, frontier closed-weights models route through Fello AI when you need the best quality on a specific task. Fello AI holds a 4.7-star rating with 25,000+ reviews across the App Store and Mac App Store.
If you want model-specific Mac clients, we also maintain installation guides for Claude on Mac, ChatGPT for Mac, Gemini desktop for Macet Grok on Mac.
Should You Wait for the M5 Ultra Mac Studio?
If you already own an M3 Ultra Mac Studio and you need it for work, there is no rush. The performance lift for video editing and 3D is meaningful but incremental. For AI inference, it is a bigger jump, and probably worth waiting if you can hold out until the fall.
If you are on an older Intel Mac Pro, an M1 Ultra, or a five-year-old iMac Pro, the M5 Ultra Mac Studio is the upgrade that finally brings you into the modern AI era. macOS 27 locks Intel out, so the upgrade is not optional. Wait for the announcement at WWDC 2026 before buying anything, even if it means living with your current machine for a few more months.
For everyone in between, the cleanest move is an M5 Max MacBook Pro today for AI work on the go, plus Fello AI for frontier cloud models, and revisit the Mac Studio question after WWDC. Our best AI models guide is the place to start if you want to know which cloud model is winning which benchmark this month.
FAQ
When is the M5 Ultra Mac Studio coming out?
Apple has not confirmed a date. The most credible window is a reveal at WWDC 2026 (June 8-12), with shipping in summer 2026. Bloomberg’s Mark Gurman reports that RAM shortages could push the launch to around October 2026.
How many GPU cores does the M5 Ultra have?
The leaked spec is 80 GPU cores, matching the maxed-out M3 Ultra on core count but adding the new GPU Neural Accelerators that sit inside every core of the M5 generation. CPU count is expected to reach 32-36 cores.
Can the M5 Ultra Mac Studio run 70B local LLMs?
Yes, comfortably. A 4-bit quantized Llama 70B takes about 42GB, which fits easily within the rumored 256GB unified memory. Combined with ~1100 GB/s memory bandwidth and per-core Neural Accelerators, it should deliver 30-45 tokens per second using MLX, Ollama, or LM Studio.
How much will the M5 Ultra Mac Studio cost?
Apple has not announced pricing. The current M3 Ultra Mac Studio starts at $3,999. Analysts expect the M5 Ultra to start slightly higher due to RAM costs, likely in the $4,299-$4,499 range for a baseline configuration, climbing past $10,000 fully loaded.
Is the Mac Studio M5 Ultra better than a PC with an RTX 5090 for local AI?
It depends on the workload. The RTX 5090 is faster on models that fit in 32GB of VRAM, reaching around 100 tok/s on Llama 70B Q4 in dual-GPU setups. The M5 Ultra wins on capability: its 256GB of unified memory lets it load models the 5090 cannot fit, and it runs at a fraction of the power draw in a single quiet box.
What is the difference between the Neural Engine and the new GPU Neural Accelerators?
The Neural Engine is a dedicated fixed-function block on older Apple chips, optimized for lower-power inference. The GPU Neural Accelerators introduced with the M5 generation are matrix-multiply units inside every GPU core, co-located with the GPU itself. They deliver much higher peak throughput for LLM inference because the weights never leave GPU memory.
Will the M5 Ultra Mac Studio have Thunderbolt 5?
Almost certainly yes. The current M3 Ultra Mac Studio ships with six Thunderbolt 5 ports at 120Gb/s each, and every leak points to the M5 Ultra reusing the same chassis and port layout.
Does macOS 27 support Intel Macs?
No. macOS 27 is Apple Silicon only. macOS Tahoe 26 is the last version that supports Intel Macs like the 2019 16-inch MacBook Pro, the 2020 27-inch iMac, and the 2019 Mac Pro. If you want current-generation Mac AI features, you need an Apple Silicon machine.




