Meta Muse Spark just scored 52 on the Artificial Analysis Intelligence Index v4.0, placing it in the top 5 overall behind GPT-5.4 (57), Gemini 3.1 Pro (57), and Claude Opus 4.6 (53). That makes it the most capable AI model Meta has ever built, and it is completely free to use. Built by Meta Superintelligence Labs (MSL), the team led by Alexandr Wang, Muse Spark is the result of a nine-month ground-up rebuild of Meta’s entire AI infrastructure.
But raw rankings only tell part of the story. Muse Spark beats every competitor on medical and health benchmarks, ranks second in multimodal vision understanding, and introduces a unique Contemplating mode that runs multiple AI agents in parallel. It also has clear weaknesses in coding and agentic tasks. This guide breaks down what Muse Spark actually does well, where it falls short, how it compares to other leading models, and how you can start using it right now.
The Key Takeaways
- Muse Spark ranks in the top 5 on the Artificial Analysis Intelligence Index with a score of 52, behind GPT-5.4, Gemini 3.1 Pro, and Claude Opus 4.6
- Best-in-class health AI, scoring 42.8 on HealthBench Hard, beating GPT-5.4 (40.1) and Gemini 3.1 Pro (20.6)
- Three reasoning modes including Contemplating, which scored 50.2% on Humanity’s Last Exam, beating both GPT-5.4 Pro (43.9%) and Gemini Deep Think (48.4%)
- Completely free at meta.ai and in the Meta AI app, with API access in private preview
- Weakest in coding (Terminal-Bench 59.0) and agentic tasks (GDPval-AA 1,444 ELO), trailing GPT-5.4 and Claude significantly
What Is Meta Muse Spark?
Meta Muse Spark is a natively multimodal reasoning model developed by Meta Superintelligence Labs, Meta’s elite AI research division. Internally codenamed “Avocado,” the model was built over nine months after Meta scrapped its previous approach and rebuilt the entire AI stack from scratch, including new infrastructure, architecture, and data pipelines.
Unlike Llama 4, which is open-source, Muse Spark is currently a closed model. Meta has stated it plans to release open-source weights in the future, but no timeline has been announced. The model powers Meta AI across all Meta platforms, with availability expanding to Facebook, Instagram, and WhatsApp in the coming weeks.
Muse Spark accepts text, image, and voice inputs but currently produces text-only output. It supports tool-use, visual chain of thought reasoning, and multi-agent orchestration. According to Meta, the model was trained using over 10x less compute than Llama 4 Maverick while achieving significantly better performance.
Meta Muse Spark Benchmarks: How It Compares
The benchmark data tells a nuanced story. Muse Spark is genuinely competitive with frontier models in several areas while having clear gaps in others. Here is how it stacks up against GPT-5.4, Claude Opus 4.6und Gemini 3.1 Pro across the most important benchmarks.
| Benchmark | Muse Spark | GPT-5.4 | Claude Opus 4.6 | Gemini 3.1 Pro |
|---|---|---|---|---|
| Artificial Analysis Index | 52 | 57 | 53 | 57 |
| Die letzte Prüfung der Menschheit | 50.2% (Contemplating) | 43.9% Pro | — | 48.4% Deep Think |
| HealthBench Hard | 42.8 | 40.1 | — | 20.6 |
| CharXiv Reasoning | 86.4 | 82.8 | — | 80.2 |
| MMMU-Pro (Vision) | 80.5% | — | — | 82.4% |
| DeepSearchQA | 74.8 | — | — | 69.7 |
| ARC-AGI-2 | 42.5 | 76.1 | — | 76.5 |
| Terminal-Bench (Coding) | 59.0 | 75.1 | — | 68.5 |
| GDPval-AA (Agentic) | 1,444 ELO | 1,674 | 1,607 | — |
| FrontierScience Research | 38.3% | 36.7% | — | 23.3% |
| IPhO 2025 Theory | 82.6 | 93.5 | — | 87.7 |
| ZeroBench (Visual) | 33.0 | 41.0 | — | 29.0 |
| MedXpertQA | 78.4 | 77.1 | — | 81.3 |
| Price | Free | Subscription | Subscription | Free tier + paid |
Sources: Meta AI blog, Artificial Analysis Intelligence Index v4.0.
One standout metric is token efficiency. Muse Spark used just 58 million output tokens to complete the full Intelligence Index evaluation. That is comparable to Gemini 3.1 Pro but far less than Claude Opus 4.6 (157M) and GPT-5.4 (120M). Efficient token use typically translates to faster responses and lower computational costs.
Where Meta Muse Spark Excels
Muse Spark is not trying to be the best at everything. But in a few specific domains, it either leads or is very close to the top.
Health and Medical AI
This is Muse Spark’s strongest area. With a 42.8 score on HealthBench Hard, it outperforms every other model tested, including GPT-5.4 (40.1) and Gemini 3.1 Pro (20.6). Meta collaborated with over 1,000 physicians to curate specialized training data for health-related queries. The model can display interactive nutritional breakdowns, explain exercise biomechanics, and provide factual health information with visual aids.
Multimodal Vision Understanding
Muse Spark scores 80.5% on MMMU-Pro, making it the second-most capable multimodal model behind Gemini 3.1 Pro (82.4%). On CharXiv Reasoning, which tests figure and chart understanding, Muse Spark leads with 86.4, ahead of GPT-5.4 (82.8) and Gemini (80.2). If your work involves analyzing images, charts, or visual data, Muse Spark is a strong contender.
Scientific Research and Reasoning
In Contemplating mode, Muse Spark scored 50.2% on Humanity’s Last Exam und 38.3% on FrontierScience Research, both ahead of GPT-5.4 Pro and Gemini Deep Think. These benchmarks test frontier-level scientific reasoning, and Muse Spark’s multi-agent approach gives it an edge on problems that benefit from parallel reasoning.
Where Meta Muse Spark Falls Short
Meta itself has acknowledged these gaps, describing them as areas of “continued investment.”
Coding and Software Development
With a Terminal-Bench 2.0 score of 59.0, Muse Spark trails GPT-5.4 (75.1) und Gemini 3.1 Pro (68.5) by a wide margin. If you need an AI assistant for writing, debugging, or reviewing code, Claude vs ChatGPT remain the stronger options. Claude Opus 4.6 also leads the best AI models ranking for complex coding tasks.
Abstract Reasoning
The ARC-AGI-2 benchmark exposes the biggest gap. Muse Spark scores 42.5, while both GPT-5.4 (76.1) and Gemini 3.1 Pro (76.5) score nearly double. This benchmark tests novel pattern recognition and abstract problem-solving, requiring the model to identify visual patterns it has never seen before and generalize from minimal examples. The fact that Muse Spark scores less than half of its competitors here suggests the model’s architecture may not handle out-of-distribution reasoning as well as it handles knowledge-intensive tasks. For users who need AI for creative problem-solving or unusual analytical tasks, this is a meaningful limitation.
Agentic and Office Tasks
Auf GDPval-AA, which measures performance on real desktop and office tasks, Muse Spark scores 1,444 ELO. That is well behind Claude Opus 4.6 (1,607) and GPT-5.4 (1,674). This benchmark tests whether an AI can autonomously complete multi-step workflows like filling out spreadsheets, navigating websites, and managing documents. Muse Spark’s lower score here means it is less reliable when you need it to handle complex, sequential tasks without manual guidance. Meta has acknowledged this as a priority area for improvement, so future updates may close the gap.
Meta is back! Muse Spark scores 52 on the Artificial Analysis Intelligence Index, behind only Gemini 3.1 Pro, GPT-5.4, and Claude Opus 4.6. Muse Spark is the first new release since Llama 4 in April 2025 and also Meta's first release that is not open weights
— Artificial Analysis (@ArtificialAnlys) April 8, 2026
Muse Spark is a new… pic.twitter.com/HdzxubOFi3
Muse Spark Reasoning Modes Explained
One of Muse Spark’s most interesting features is its tiered approach to reasoning. Instead of a single processing mode, it offers three distinct levels.
Instant mode handles casual, everyday queries with fast response times. Think quick questions, simple lookups, and conversational exchanges. This is the default mode for most interactions.
Thinking mode adds deeper analysis when you need more thorough reasoning. The model takes extra processing time to work through complex problems step by step. This is comparable to the reasoning modes you find in ChatGPT vs Gemini and other frontier models.
Contemplating mode is where Muse Spark genuinely differentiates itself. Rather than a single model reasoning harder, it orchestrates multiple AI agents that reason in parallel and synthesize their findings. This multi-agent approach achieved 58% on Humanity’s Last Exam (with tools) and 38% on FrontierScience Research. Meta is rolling out Contemplating mode gradually, so it may not be available to all users immediately.
How to Use Meta Muse Spark
Getting started with Muse Spark is straightforward, and it is completely free. There is no waitlist, no subscription, and no account creation beyond your existing Meta login.
- Visit meta.ai in your browser or download the Meta AI app on your phone
- Start a conversation by typing a question or uploading an image
- Select your reasoning mode if available (Instant is the default; Thinking and Contemplating may require opt-in)
- Use multimodal features by photographing objects, sharing screenshots, or uploading images for analysis
- Try the shopping assistant to compare products, get pros and cons, and find purchase links
To get the most out of Muse Spark, match the reasoning mode to your task. Use Instant for quick factual lookups and casual conversation. Switch to Thinking when you need the model to work through a multi-step problem, analyze a document, or provide a detailed explanation. Reserve Contemplating for genuinely hard problems where you want multiple perspectives synthesized, like research questions or complex decisions. Keep in mind that Contemplating mode uses more processing time, so it is best saved for tasks where accuracy matters more than speed.
For visual tasks, Muse Spark works well when you upload images directly. You can photograph a product to get a detailed breakdown, share a chart for analysis, or snap a picture of a home appliance for troubleshooting guidance. The model generates annotated text responses that reference specific parts of your image.
Muse Spark will also roll out across Facebook, Instagram, and WhatsApp in the coming weeks. If you are already using Meta AI on any of these platforms, you will receive Muse Spark automatically. The experience will be integrated directly into your existing chat interfaces.
For developers, Meta has opened a private API preview to select users. No public API pricing or documentation is available yet, but given Meta’s track record with LLaMA, developer access should expand over time. You can check the Meta AI blog for updates on API availability and open-source plans.
Meta Muse Spark vs. Other Free AI Options
If you regularly use AI assistants, you might be wondering how Muse Spark fits alongside models you can already access. The AI landscape now has several strong free or affordable options, and understanding where each model excels helps you pick the right tool for each task.
Muse Spark’s free tier is the most generous among frontier models. There are no subscription fees, and the full model (including Thinking mode) is available to everyone with a Meta account. The tradeoff is that rate limits may apply for heavy users, and you are limited to Meta’s platforms with no way to integrate it into your own workflows until the API opens up.
ChatGPT (GPT-5.4) offers a free tier as well, but the most powerful features require a $20/month Plus subscription. GPT-5.4 is the better choice for coding, agentic tasks, and abstract reasoning based on current benchmarks. Gemini 3.1 Pro has a generous free tier through Google AI Studio, and leads on several reasoning benchmarks while matching GPT-5.4 on overall capability.
For users who want access to all major models without managing multiple subscriptions, Fello AI provides GPT-5.4, Claude, Gemini, and more in a single app for $9.99/month. This is particularly useful when you want to compare outputs across models or route specific tasks to the model that handles them best. You can ask the same question to multiple AI models and see which answer is most helpful for your specific use case.
The practical recommendation: use Muse Spark for health queries, visual analysis, and general conversation where it is free and competitive. Use other AI models for coding, complex reasoning, and productivity workflows where they have a measurable edge.
What Meta Muse Spark Means for the AI Landscape
Muse Spark represents a significant shift in Meta’s AI strategy. Rather than relying solely on the open-source Llama family, Meta now has a competitive closed model that can go head-to-head with the best from OpenAI, Google, and Anthropic. This builds on Meta’s broader push to compete directly with ChatGPT, Claude, and Grok through its own standalone AI products.
The model also validates the approach of building specialized strengths rather than chasing state-of-the-art scores everywhere. Muse Spark’s dominance in health AI and multimodal vision, combined with its token efficiency, shows that Meta is targeting specific high-value use cases where it can genuinely lead.
Meta’s decision to keep Muse Spark free also pressures competitors. While ChatGPT and Claude charge subscription fees for their most capable models, Meta is betting that free access will drive adoption across its 3+ billion user base on Facebook, Instagram, and WhatsApp. Whether Muse Spark’s open-source release materializes, and how quickly Meta closes the gaps in coding and agentic tasks, will determine whether it becomes a true frontier competitor or remains a strong second-tier option.
Schlussfolgerung
Meta Muse Spark is a genuinely capable AI model that excels in health reasoning, multimodal vision, and scientific research. It is free, accessible, and backed by Meta’s massive distribution network. Its weaknesses in coding and agentic tasks are real but acknowledged, and Meta’s track record of rapid iteration suggests these gaps could narrow quickly. If you want to try Muse Spark, head to meta.ai today. For the best results across all AI tasks, consider using multiple models through Fello AI to match each task to the model that handles it best.
FAQ
What is Meta Muse Spark?
Meta Muse Spark is the first AI model from Meta Superintelligence Labs, built from the ground up with new infrastructure and architecture. It is a natively multimodal reasoning model that accepts text, image, and voice inputs.
Is Muse Spark free to use?
Yes. Muse Spark is completely free through meta.ai and the Meta AI app. Meta may impose rate limits for heavy usage, but there are no subscription fees or paywalls.
Is Muse Spark better than ChatGPT?
It depends on the task. Muse Spark beats GPT-5.4 on health benchmarks (42.8 vs 40.1), scientific reasoning, and chart understanding. GPT-5.4 leads significantly in coding (75.1 vs 59.0), abstract reasoning, and agentic tasks. Neither model is better across the board.
Is Muse Spark open source?
Not currently. Unlike Meta’s Llama models, Muse Spark launched as a closed model. Meta has stated plans to release open-source weights in the future, but no specific timeline has been announced.
How is Muse Spark different from Llama?
Llama is Meta’s open-source model family available for developers to download and run locally. Muse Spark is a closed, consumer-facing model built by a separate team (Meta Superintelligence Labs) with a completely different architecture. Muse Spark is more capable but only accessible through Meta’s platforms.




