MiniMax M2.5: A New Chinese Open-Source Model That Claims to Beat Claude & GPT at Coding
MiniMax M2.5 scores 80.2% on SWE-Bench Verified, runs at 100 tokens/sec, and costs $1/hour. Here’s what we know — and what’s still unverified.
MiniMax M2.5 scores 80.2% on SWE-Bench Verified, runs at 100 tokens/sec, and costs $1/hour. Here’s what we know — and what’s still unverified.
On February 5, 2026, Anthropic released Claude Opus 4.6 — the latest and most capable model in its Claude lineup. Arriving just three months after Opus 4.5, this release brings a 1-million-token context window to the Opus family for the first time, introduces collaborative agent teams in Claude Code, and delivers benchmark results that put it ahead of GPT-5.2 and Gemini 3 Pro across most evaluations. Months later it remains a yardstick for new contenders; ByteDance’s Seed 2.1 Pro drew level with Opus 4.6 on Code Arena: Frontend, both scoring 1539. But the headline number that caught the industry’s attention wasn’t a benchmark score. It was 500 — the number […]
On January 27, 2026, Chinese AI company Moonshot AI released Kimi K2.5 — and the tech world took notice. Within days, independent evaluations confirmed what the company claimed: Kimi K2.5 performs on par with the best AI models from OpenAI, Google, and Anthropic on many key benchmarks. But unlike ChatGPT, Claude, or Gemini, Kimi K2.5 is open source. That means developers can download it, modify it, and build with it freely. This matters because, until now, the most capable AI models have been locked behind expensive APIs controlled by a handful of American companies. Kimi K2.5 represents a shift — proof that open-source models can compete at the frontier of […]
Today, November 25, 2025, Black Forest Labs released FLUX.2, a new family of image-generation models aimed directly at the high-end creative, marketing, and product-visualization markets. The company, founded in 2024 and known for its open-core approach to multimodal research, positions FLUX.2 as both a frontier-level image generator and a model that can actually hold up in real production workflows—something many AI tools still struggle with. The release comes at a busy time for image-generation models. OpenAI’s GPT-4o tools, Google’s Imagen 4 and Nano Banana Pro, Midjourney v6, and Stability’s SD3 are all fighting for attention. FLUX.2 enters the mix with a clear focus: photorealism, consistent references, reliable text, and practical workflow control. Here’s […]
November 24, 2025. Anthropic has officially launched Claude Opus 4.5, a major refresh of its top-tier model and the company’s strongest push yet in the fight for AI leadership. Coming just days after Google’s Gemini 3 debut, the new Opus arrives with a sharp focus on professional coding, long-running agents, and desk-work automation—and Anthropic is backing the launch with aggressive pricing and hard benchmark data. The company says Opus 4.5 is now the leading model for real-world software engineering, slide and spreadsheet editing, and multi-step agentic workflows. Early numbers support the claim, with the model showing large performance gains across enterprise tasks. And in a move aimed at accelerating adoption, Anthropic has cut Opus […]
On Wednesday, November 12, 2025, OpenAI unveiled GPT-5.1, a major mid-cycle upgrade to its flagship ChatGPT models. The update introduces two new variants — GPT-5.1 Instant and GPT-5.1 Thinking — both designed to make conversations faster, smarter, and more natural. According to OpenAI, GPT-5.1 is built to feel “warmer” and more human-like in tone, while also improving instruction-following and reasoning. The Instant model powers most everyday chats, offering a friendlier and more conversational style. The Thinking model, aimed at deeper reasoning tasks, adapts its “thinking time” based on complexity — responding quickly to simple questions and spending extra effort on hard ones. The rollout begins immediately for ChatGPT Plus, Pro, Go, and Business users, with Enterprise and Education accounts getting early access before […]
On November 6, 2025, Alibaba-backed Moonshot AI released Kimi K2 Thinking, its most advanced open-source model yet. It’s the first reasoning-focused variant in the Kimi K2 family and marks a major step forward in long-context, multi-step reasoning and autonomous tool use. Kimi K2 Thinking immediately made headlines for its performance: it set new state-of-the-art scores on several open benchmarks, including Humanity’s Last Exam (HLE) and BrowseComp, where it outperformed closed models like GPT-5and Claude Sonnet 4.5. Unlike its competitors, K2 Thinking is fully open-weight, offering public access to its architecture, weights, and API, with only minimal license restrictions. Its release marks a key moment in the open vs. closed model race. While U.S. labs like OpenAI, Anthropic, and xAI keep their top […]
2025年9月29日、Anthropicは最新のフロンティアAIモデルであるClaude Sonnet 4.5をリリースし、同社がこれまでに開発したモデルの中で最も能力が高く、最も整合性のあるモデルと位置づけた。このリリースは、これまでのクロード・ソネットとオーパスのリリースの基礎の上に構築されており、持続的な自律パフォーマンス、コーディング能力、ツールの使用、安全行動の測定可能な改善に焦点を当てています。Anthropicは、2ヶ月弱の社内テストの後、API、Amazon Bedrock、Google Vertex AIにおいてClaude Sonnet 4.5を本番稼動させました。 このモデルは現在、Claude.aiとAPIアクセスを通じてすべてのユーザーが利用可能で、間もなくFello AIに統合される予定です。その内容を詳しく見てみよう。
Alibaba’s AI stack just crossed a symbolic frontier. On September 5 2025, the company’s cloud division unveiled Qwen-3-Max-Preview (Instruct), a 1-trillion-parameter large-language model that immediately takes its place among the most complex systems ever deployed and the most powerful AI models. The rollout began quietly inside Qwen Chat and Alibaba Cloud Model Studio. Soon coming to Fello AI. Only a few months ago, OpenAI’s GPT-4o, Google DeepMind’s Gemini 2.5, and Anthropic’s Claude Opus 4 each claimed slices of the “frontier-model” crown. Meanwhile, China-based challengers have been racing to close the capability gap. By releasing a one-trillion-parameter model, Alibaba is making it clear it wants to compete at the very top, not on the sidelines. […]
Moonshot AI, Chinese startup backed by Alibaba and Tencent, is rolling out an upgraded version of its open‑weight open-source large language model, Kimi K2. The new build Kimi K2‑0905, introduces a 256,000-token context window, improved coding performance, and retains its permissive modified MIT license. Although early beta access was briefly mentioned on the company’s Discord before details were removed, the model is now fully available to the public. Weights and a complete model card are live on Hugging Face, making it easy for developers to download, run, and fine-tune the model locally. Moonshot’s Meteoric Rise Founded in March 2023, Moonshot AI has grown at breakneck speed. In just over a year, the startup […]
何千人ものAIファンやプロフェッショナルの方々と一緒に、業界のリーダーたちから独占的なヒントや洞察を得ることができます。.