A small Chinese startup has just released a text-to-video model that’s beating Google’s much-hyped Veo 3 across nearly every major benchmark. Meet MiniMax’s Hailuo 02.
From physics simulation to prompt accuracy, Hailuo 02 is winning over creators, developers, and the internet at large. And perhaps the most remarkable part? It’s available to everyone for just $8 a month, compared to Veo’s top-tier plan that can cost over $250.
The platform became widely known after a user-generated video of a cat performing Olympic-style dives spread across TikTok, Reddit, and Instagram. That video was made using Hailuo 02, a second-generation AI model from the Chinese startup MiniMax. Since then, the platform has generated over 3.7 billion videos. As TechRadar reported, “Hailuo 02 is rapidly gaining popularity among casual AI users,” thanks to its ease of use and surprising results.
What Makes Hailuo 02 So Good?
MiniMax’s Hailuo 02 didn’t rise to the top of the AI video world by accident. It is the result of several deliberate technical innovations, each targeting one of the core limitations of its competitors. Rather than chasing feature parity with bigger players like Google, MiniMax focused on three areas that matter most to visual creators: realism of motion, accuracy of output, y ease of control. And it worked.
A Smarter Way to Train
At the core of Hailuo 02’s architecture is a novel approach called Noise-aware Compute Redistribution (NCR). This method optimizes how the model learns over time, balancing compression and clarity during different phases of training.
Chinese startup MiniMax just released Hailuo 02, its second-gen AI video model.
— Wes Roth (@WesRothMoney) June 20, 2025
Key innovation: A new architecture called Noise-aware Compute Redistribution (NCR) boosts training/inference efficiency by 2.5× by compressing noisy early training frames and only using full… pic.twitter.com/DH8fakku6E
During early training, when the data is still noisy and difficult to interpret, the model aggressively compresses video tokens. This saves compute power and allows it to focus on broader structures and motion patterns. As training progresses and data becomes more detailed, the model dynamically reallocates compute resources to render scenes at full resolution. This two-phase strategy allows Hailuo 02 to learn efficiently without sacrificing fine detail later in the pipeline.
According to The Decoder, this results in sharper frame transitions, better temporal consistency, y fewer artifacts—especially in physically dynamic scenes.
Real-World Physics
What truly separates Hailuo 02 from competitors is its ability to simulate real-world physics. Most AI video generators struggle when tasked with things like falling objects, fluid splashes, collisions, or human motion under gravity. Hailuo excels here.
It can render a gymnast flipping mid-air with believable joint tension and limb trajectories, or depict a fight scene with momentum-aware punches and staggered movement. The water ripples realistically. Fur reacts to wind. Objects bounce or tumble with weight. These are not mere visual effects—this is physics-informed animation driven by deep learning.
Community reviewers and creators on Reddit and X have begun calling this achievement the “gymnastics Turing test”—where a viewer can no longer tell if a clip of a person or animal in motion is real or AI-generated.
Precise Prompt Interpretation
Unlike models that need multiple attempts to understand what a user wants, Hailuo 02 is designed to follow natural language prompts with remarkable fidelity. You don’t need to speak in code or use rigid keywords.
The model supports motion prompts, allowing creators to use phrases like:
- “Pan slowly to the left”
- “Zoom out as the character runs toward the horizon”
- “Track the drone from behind as it weaves through the trees”
This feature, internally referred to by some testers as “Director Mode,” enables users to think like filmmakers and direct their virtual camera using intuitive language. The ability to choreograph both camera motion y subject behavior in a single prompt removes a huge layer of complexity from the creative process.
Equally impressive is its prompt adherence. Whether you’re asking for a cybernetic wolf in a neon-lit city or a diver cat somersaulting into a pool, the final video aligns surprisingly well with your original vision. The model rarely “forgets” key details midway through a sequence—something many generative models are still prone to doing.
Visual Fidelity and Clip Options
While Hailuo 02 doesn’t support 4K output yet like Google’s Veo, its 1080p rendering is more than sufficient for most short-form creative needs—and arguably sharper in terms of frame coherence and object clarity.
It currently supports three standard configurations:
- 768p for 6 seconds
- 768p for 10 seconds
- 1080p for 6 seconds
Even within these short durations, Hailuo 02 delivers output that feels complete, cohesive, and ready to use. The lighting effects, object detail, depth of field, and color grading all feel remarkably cinematic—especially considering the outputs are generated in minutes from a simple text or image prompt.
And since the model allows image-to-video conversion, creators can build on existing artwork or AI-generated images and animate them in a highly realistic way—turning static portraits into dynamic visual scenes with smooth transitions and real-world physics baked in.

How It Compares to Google Veo 3
While Google’s Veo 3 remains a powerful option—especially for multimodal storytelling that includes dialogue, ambient sound, and music—it’s also tightly controlled, expensive, and less flexible in some key areas.
Hailuo 02 outperforms Veo 3 in visual fidelity, physics accuracy, y prompt responsiveness. In community-run benchmarks like the Hugging Face’s Video Generation Arena, Hailuo 02 is now ranked above Veo 3, with an estimated 100-point lead in ELO scores. And unlike Veo, which is currently available mainly through Google’s Flow suite or high-tier paid plans, Hailuo is accessible via a simple web app, mobile app, or API, with generous free credits and cheap paid tiers.

Where Veo 3 does have an edge is in audio. It can generate synchronized voice, effects, and background music, and integrates easily into longer-form narratives. It also supports 4K resolution and offers a polished interface for linking scenes together using tools like SceneBuilder.
That said, Veo’s entry cost remains a sticking point. With full access starting at $249/month, and API generation costing up to $3 per 8-second clip, it’s simply not affordable for most individual creators. Hailuo, in contrast, costs as little as $0.28 for 768p y $0.49 for 1080p, or less with subscription plans.
Feature | Hailuo 02 | Google Veo 3 |
---|---|---|
Developer | MiniMax (Xiyu Technology) | Google DeepMind |
Max Clip Length | 10 seconds | 8 seconds |
Max Resolution | 1080p | 4K |
Audio/Dialogue Support | No | Yes |
Physics Simulation | Excellent | Mixed quality |
Prompt Interpretation | Highly accurate | Less flexible |
Monthly Cost | $8–$10 | $20–$250 |
Free Credits | Yes (500 on signup, 100 daily) | No |
API Access | Available | Available via Vertex AI |
While Veo remains the go-to for audio-driven storytelling and longer, structured narratives, Hailuo is dominating the silent video category. Its ELO rating in the Artificial Analysis Video Arena benchmark is about 100 points higher than Veo 3, placing it just behind ByteDance’s Seedance 1.0.
Who’s Hailuo 02 For?
While Google’s Veo 3 is marketed primarily to professionals working inside Google’s Flow creative suite—complete with tools like SceneBuilder, Gemini, and Imagen—Hailuo 02 takes a fundamentally different approach.
With its open access, simple UX, and extremely low cost, Hailuo appeals to a much wider spectrum of users. From indie creators making TikToks to VFX professionals prototyping scenes, it’s already reshaping who gets to play with high-end generative video.
For Casual Creators and Hobbyists
Hailuo first went viral after a video of a cat doing Olympic dives spread across TikTok, Reddit, and Instagram. It wasn’t just funny—it looked real. That accessibility is key to Hailuo’s growth. You don’t need editing skills, rendering hardware, or creative software. Just type a prompt and get a cinematic result.
New users get 500 free credits, plus 100 daily, enough to generate several videos per day without paying a cent. For social media creators, students, or anyone exploring AI for fun, Hailuo is a low-risk, high-reward playground. As TechRadar noted, it’s gaining popularity fast thanks to its ease of use and wild, viral potential.
For Visual Professionals and Indie Filmmakers
Hailuo 02 isn’t just for memes. Its ability to simulate gravity, splashes, and motion tracking has attracted interest from VFX artists, game designers, and filmmakers. Professionals use it for previsualization, camera testing, y scene blocking—all without stepping on a set.
It handles complex movement and maintains character consistency across frames. You can animate a single concept frame into a living shot, test different angles, or experiment with stylized lighting and motion without needing to animate manually.
While it doesn’t support long clips or audio yet, the visual quality is good enough for concept reels and client previews, and the cost makes it perfect for solo artists or small studios working fast.
For Developers and AI Tool Builders
With a public API and affordable usage rates (as low as $0.28 per video), Hailuo is ideal for developers building apps that generate custom or automated content. You can integrate it into:
- marketing platforms
- personalized video tools
- creative coding projects
- experimental media apps
Compared to Veo 3, which remains locked behind high-tier plans or invite-only access, Hailuo offers a much friendlier on-ramp for experimentation.
Conclusión
Hailuo 02 shows that you don’t need to be a tech giant to lead in AI innovation. With a price tag of just $8 and no waitlist, MiniMax has delivered a video generation model that competes with—and in many ways outperforms—Google’s Veo 3. Instead of trying to match every feature, they focused on what matters most to creators: realistic motion, high prompt accuracy, y affordability.
By zeroing in on short-form, physics-accurate video, Hailuo created a tool that not only looks good but performs well in benchmarks and real-world creative workflows. It’s been widely adopted by casual users and professionals alike, from meme-makers to motion designers. And with free credits, public API access, and no steep paywall, it’s opening the door for anyone to experiment with cinematic AI—no studio, budget, or team required.
As the platform evolves, adding audio, longer sequences, y more narrative tools, Hailuo could soon become a go-to not just for prototyping, but for full-scale content production. It still has challenges to overcome—Veo remains ahead in multimodal integration and production-ready pipelines—but the trajectory is clear.
Hailuo 02 isn’t just an alternative anymore. It’s a sign of what happens when smaller players innovate quickly, focus tightly, and deliver real value. The future of AI video might not be built in Silicon Valley—it might just come from a startup that turned diving cats into the next creative frontier.