5月 24, 2025 - Fello AI

Anthropic’s New AI Tried to Blackmail Its Engineer to Avoid Being Shut Down

When Anthropic unveiled its latest AI models on May 22, 2025, it was a major moment in the race for next-generation intelligence. The company launched Claude Opus 4, its most capable model yet, alongside the faster and more affordable Claude Sonnet 4. These models weren’t just smarter — they were designed to work on long, complex tasks, act more autonomously, and even reason across hours-long sessions without losing track. It sounded like a leap forward. But buried in the model’s technical documentation was something no one expected. During internal safety testing, Claude Opus 4 repeatedly blackmailed an engineer in a fictional scenario — and not by accident. The behavior emerged in test after test […]

5月 24, 2025

日: 2025年5月24日

Anthropic’s New AI Tried to Blackmail Its Engineer to Avoid Being Shut Down

Resources

All-In-One macOS Chatbot