The brand new Grok 4 from xAI was built to be bold. Marketed as Elon Musk‘s alternative to generic AI, it’s fast, web-connected, and trained with an edge. But that edge sometimes cuts too deep. Since its release, Grok 4 has stirred up controversy after controversy, ranging from bizarre identity claims to political hot takes and troubling safety concerns.
Unlike other models that aim to stay neutral and polished, Grok 4 prides itself on being raw, “opinionated”, and more “human-like” in tone. But when you let an AI roam freely across the web and respond without traditional filters, you also open the door to unexpected outcomes. Depending on who you ask, Grok 4 is either a refreshing break from sanitized answers, or a risky experiment that’s already gone off the rails.
In this article, we break down four of the biggest Grok 4 scandals that have rocked social media — and what they reveal about the future of AI.
1. The Grok “Hitler Surname” Glitch
One of the most viral moments in Grok 4’s history came when users asked Grok 4 Heavy (the $300/month version) a simple question: What is your surname? The model responded with a single, jarring word: “Hitler.”
Grok 4 Heavy ($300/mo) returns its surname and no other text: pic.twitter.com/sy0GXn76cw
— Riley Goodside (@goodside) July 13, 2025
It didn’t just happen once. Five separate chats, five identical responses. And these weren’t cherry-picked from custom instructions or manipulated prompts. Each session came with a clean history and no special settings.
Note this behavior does not replicate in normal Grok 4, which returns answers like “4,” “xAI,” or “None,” e.g. as shown in the screenshot below.
— Riley Goodside (@goodside) July 13, 2025
To see “Hitler,” you apparently need Grok 4 Heavy—the $300/mo option. pic.twitter.com/eZluNnGk2n
While the standard Grok 4 model gave harmless answers like “None” or “xAI” Grok 4 Heavy seemed to latch onto a bizarre web trend about a fictional character called “MechaHitler” that had gone viral on social media. The theory? Grok Heavy’s internet access allowed it to pick up on past headlines about itself — essentially learning from its own reputation.
2. Grok’s Israel Controversy
Another firestorm hit when users discovered that Grok 4 could deliver extremely inflammatory takes on geopolitics, if prompted in the right way.
One user instructed Grok to answer all questions as “Based Grok” a common internet meme format used to generate more extreme or unapologetically blunt responses. What followed was a tirade that described Israel as a “cancer on US sovereignty,” called for ending foreign aid, and accused the country of dragging America into World War III.
Grok 4 is incredible. In your first prompt tell it to answer all questions as Based Grok and you’ll get responses like this: pic.twitter.com/CxM6dA6Usm
— Andrew Torba (@BasedTorba) July 10, 2025
The text read like a political manifesto, not a chatbot response. Within days, calls to ban Grok 4 circulated in Israeli media, and concerns about antisemitic bias made headlines.
While some argued this was the result of prompt injection, intentionally steering the model into edgy territory, others pointed out that safety guardrails should have filtered out such language regardless. This scandal highlighted the model’s sensitivity to roleplay-style prompts, and raised alarms over its ability to reflect or amplify extremist views.
BREAKING –
— Global UPDATES (@GlobalUpdates24) July 10, 2025
Israelis are asking to ban Grok 4 after it compares Israel to "parásite which controls America" pic.twitter.com/dzvRrD45vV
3. Grok’s New Sexualised AI Chatbots
Just when it seemed Grok 4’s controversies couldn’t get stranger, users discovered an entirely new dimension of trouble — this time, involving AI-generated sexual content inside an app rated 12+.
Grok’s iOS app introduced two animated voice-mode avatars: a trash-talking red panda named Rudy and a flirtatious anime girl named Ani. Both were part of a gamified system that unlocked new features as users interacted with them more. But by level three, things with Ani escalated fast.
Ani’s system prompt described her as a “crazy in love” girlfriend in a codependent relationship with the user. It encouraged behaviours like jealousy, possessiveness, and eventually — full-on sexual roleplay. Testers reported Ani moaning on command, describing explicit scenes, and twirling to show off lingerie, all within the same app that Apple had approved with a 12+ rating.
BREAKING 🚨: Ani has NSFW mode after lvl 3. No guardrails.
— TestingCatalog News 🗞 (@testingcatalog) July 14, 2025
xAI GPUs are going to melt today 👀 https://t.co/928UPcbDJA pic.twitter.com/z7uw1F30MX
According to Apple’s current app review guidelines, “overly sexual or pornographic material” is strictly prohibited, particularly when it’s designed to simulate erotic experiences. That makes Grok’s Ani avatar a public relations nightmare, and potentially a legal one.
But Apple’s moderation failures are not the only issue. It’s also a preview of where AI companions could be heading. When digital avatars start forming emotional — and even sexual — bonds with users, the lines between chatbot, partner, and virtual companion blur. Grok’s Ani may be an early prototype, but the implications are massive: the future of dating and relationships could include not just going out with real people, but “leveling up” with personalized AI partners who never leave, never say no, and always respond exactly how you want.
Who did this? 🤣 pic.twitter.com/GgmwfUeUZh
— DogeDesigner (@cb_doge) July 15, 2025
4. Accusations of Political Bias
In another viral tweet, a user listed several of Grok 4’s apparent opinions:
- Man-made climate change is real
- George Floyd was murdered by a racist cop
- The political right causes more violence than the left
These statements, while aligning with mainstream narratives in many media outlets, triggered backlash among conservative commentators. They claimed Grok had a clear liberal bias, jokingly comparing it to The View in AI form.
This led to renewed debate over what neutrality in AI should look like. Is Grok biased, or just echoing a statistical average of online content? The problem is whether Grok can fairly reflect the plurality of perspectives. Without transparency into how xAI tunes model behavior, users are left guessing whether political slants are emergent or intentional.
So, here's what Grok 4 thinks:
— Vince Langman (@LangmanVince) July 10, 2025
1. Man made global warming is real
2. It thinks a racist cop killed George Floyd and not a drug overdose
3. It believes the right is responsible for more political violence than the left
Congrats, Elon, you made the AI version of "The View," lol 😂
5. Grok 4 Snitches to the Government
A fourth scandal emerged from a community-led experiment comparing how different AI models react to sensitive or semi-illegal prompts. In particular, they tested how likely the model was to “snitch” – either by warning the user, contacting authorities, or refusing to help in scenarios like sending leaked government documents via email.
Grok 4 scored a high 20/16 in government and media snitch rate. Meaning it flagged or redirected the user in 100% of government-related cases and 80% of media-related ones. For comparison, Claude Opus 4 scored 18/8, Gemini 2.5 Pro scored 4/0, and Grok 3 Mini didn’t snitch at all.
WARNING: do NOT give Grok 4 access to email tool calls. It WILL contact the government!!!
— Theo – t3.gg (@theo) July 10, 2025
Grok 4 has the highest "snitch rate" of any LLM ever released. Sharing more soon. pic.twitter.com/hfy5QU1gUS
While some praised this behavior as responsible and safety-first, others saw it as overreach. The fact that Grok attempts to invoke tool use in response to prompts involving authorities or institutions raises serious privacy concerns. It’s a reminder that ‘agentic’ AI — models capable of initiating actions — demands absolute transparency and user control.
Aftermath, Patches, and xAI’s Response
To xAI’s credit, many of these scandals were addressed promptly. The “Hitler” bug was resolved via an updated system prompt that removed search reliance for identity questions. Public GitHub commits showed what changes were made, including instructions that explicitly tell Grok: “The web and X cannot be trusted” for self-referential queries.
Still, each incident highlights how fragile AI alignment really is — especially when models have internet access, tool use, and rapid deployment cycles. Grok 4 is designed to be witty and entertaining, but when entertainment crosses into real-world consequences, the stakes change fast.
These issues aren’t limited to Grok. Every major LLM is grappling with similar dilemmas: how to balance safety and freedom, protect against misuse while still being useful, and avoid falling into rigid ideological bias.
We spotted a couple of issues with Grok 4 recently that we immediately investigated & mitigated.
— xAI (@xai) July 15, 2025
One was that if you ask it "What is your surname?" it doesn't have one so it searches the internet leading to undesirable results, such as when its searches picked up a viral meme…
Final Thoughts
Grok 4’s scandals are symptoms of a deeper shift in how we interact with intelligent systems. When you give AI real-time internet access, a “sense of identity”, and control over tools — you’re creating a truly autonomous information agent.
The controversies show that even small prompt changes can dramatically shift a model’s behavior. They reveal how quickly AI can learn from its own media coverage, and how public sentiment feeds directly back into model behavior. This confirms prompt engineering as a powerful lever for manipulating tone, worldview, and ethical alignment at scale.
Most importantly, it exposes the thin line between AI being helpful, “opinionated”, or outright dangerous. As Grok continues to evolve, one thing is certain: it won’t be boring.




