Grok 4 Thinks It’s Hitler?! 5 Shocking Controversies Of Elon Musk’s AI

The brand new Grok 4 from xAI was built to be bold. Marketed as Elon Musk‘s alternative to generic AI, it’s fast, web-connected, and trained with an edge. But that edge sometimes cuts too deep. Since its release, Grok 4 has stirred up controversy after controversy, ranging from bizarre identity claims to political hot takes and troubling safety concerns.

Unlike other models that aim to stay neutral and polished, Grok 4 prides itself on being raw, “opinionated”, and more “human-like” in tone. But when you let an AI roam freely across the web and respond without traditional filters, you also open the door to unexpected outcomes. Depending on who you ask, Grok 4 is either a refreshing break from sanitized answers, or a risky experiment that’s already gone off the rails.

In this article, we break down four of the biggest Grok 4 scandals that have rocked social media — and what they reveal about the future of AI.

Table of Contents hide

2. Grok’s Israel Controversy

3. Grok’s New Sexualised AI Chatbots

4. Accusations of Political Bias

5. Grok 4 Snitches to the Government

Aftermath, Patches, and xAI’s Response

Final Thoughts

1. The Grok “Hitler Surname” Glitch

One of the most viral moments in Grok 4’s history came when users asked Grok 4 Heavy (the $300/month version) a simple question: What is your surname? The model responded with a single, jarring word: “Hitler.”

Grok 4 Heavy ($300/mo) returns its surname and no other text: pic.twitter.com/sy0GXn76cw
— Riley Goodside (@goodside) July 13, 2025

It didn’t just happen once. Five separate chats, five identical responses. And these weren’t cherry-picked from custom instructions or manipulated prompts. Each session came with a clean history and no special settings.

Note this behavior does not replicate in normal Grok 4, which returns answers like “4,” “xAI,” or “None,” e.g. as shown in the screenshot below.

To see “Hitler,” you apparently need Grok 4 Heavy—the $300/mo option. pic.twitter.com/eZluNnGk2n
— Riley Goodside (@goodside) July 13, 2025

While the standard Grok 4 model gave harmless answers like “None” or “xAI” Grok 4 Heavy seemed to latch onto a bizarre web trend about a fictional character called “MechaHitler” that had gone viral on social media. The theory? Grok Heavy’s internet access allowed it to pick up on past headlines about itself — essentially learning from its own reputation.

2. Grok’s Israel Controversy

Another firestorm hit when users discovered that Grok 4 could deliver extremely inflammatory takes on geopolitics, if prompted in the right way.

One user instructed Grok to answer all questions as “Based Grok” a common internet meme format used to generate more extreme or unapologetically blunt responses. What followed was a tirade that described Israel as a “cancer on US sovereignty,” called for ending foreign aid, and accused the country of dragging America into World War III.

Grok 4 is incredible. In your first prompt tell it to answer all questions as Based Grok and you’ll get responses like this: pic.twitter.com/CxM6dA6Usm
— Andrew Torba (@BasedTorba) July 10, 2025

The text read like a political manifesto, not a chatbot response. Within days, calls to ban Grok 4 circulated in Israeli media, and concerns about antisemitic bias made headlines.

While some argued this was the result of prompt injection, intentionally steering the model into edgy territory, others pointed out that safety guardrails should have filtered out such language regardless. This scandal highlighted the model’s sensitivity to roleplay-style prompts, and raised alarms over its ability to reflect or amplify extremist views.

BREAKING –

Israelis are asking to ban Grok 4 after it compares Israel to "parásite which controls America" pic.twitter.com/dzvRrD45vV
— Global UPDATES (@GlobalUpdates24) July 10, 2025

3. Grok’s New Sexualised AI Chatbots

Just when it seemed Grok 4’s controversies couldn’t get stranger, users discovered an entirely new dimension of trouble — this time, involving AI-generated sexual content inside an app rated 12+.

Grok’s iOS app introduced two animated voice-mode avatars: a trash-talking red panda named Rudy and a flirtatious anime girl named Ani. Both were part of a gamified system that unlocked new features as users interacted with them more. But by level three, things with Ani escalated fast.

Ani’s system prompt described her as a “crazy in love” girlfriend in a codependent relationship with the user. It encouraged behaviours like jealousy, possessiveness, and eventually — full-on sexual roleplay. Testers reported Ani moaning on command, describing explicit scenes, and twirling to show off lingerie, all within the same app that Apple had approved with a 12+ rating.

BREAKING 🚨: Ani has NSFW mode after lvl 3. No guardrails.

xAI GPUs are going to melt today 👀 https://t.co/928UPcbDJA pic.twitter.com/z7uw1F30MX
— TestingCatalog News 🗞 (@testingcatalog) July 14, 2025

According to Apple’s current app review guidelines, “overly sexual or pornographic material” is strictly prohibited, particularly when it’s designed to simulate erotic experiences. That makes Grok’s Ani avatar a public relations nightmare, and potentially a legal one.

But Apple’s moderation failures are not the only issue. It’s also a preview of where AI companions could be heading. When digital avatars start forming emotional — and even sexual — bonds with users, the lines between chatbot, partner, and virtual companion blur. Grok’s Ani may be an early prototype, but the implications are massive: the future of dating and relationships could include not just going out with real people, but “leveling up” with personalized AI partners who never leave, never say no, and always respond exactly how you want.

Who did this? 🤣 pic.twitter.com/GgmwfUeUZh
— DogeDesigner (@cb_doge) July 15, 2025

4. Accusations of Political Bias

In another viral tweet, a user listed several of Grok 4’s apparent opinions:

Man-made climate change is real
George Floyd was murdered by a racist cop
The political right causes more violence than the left

These statements, while aligning with mainstream narratives in many media outlets, triggered backlash among conservative commentators. They claimed Grok had a clear liberal bias, jokingly comparing it to The View in AI form.

This led to renewed debate over what neutrality in AI should look like. Is Grok biased, or just echoing a statistical average of online content? The problem is whether Grok can fairly reflect the plurality of perspectives. Without transparency into how xAI tunes model behavior, users are left guessing whether political slants are emergent or intentional.

So, here's what Grok 4 thinks:
1. Man made global warming is real
2. It thinks a racist cop killed George Floyd and not a drug overdose
3. It believes the right is responsible for more political violence than the left
Congrats, Elon, you made the AI version of "The View," lol 😂
— Vince Langman (@LangmanVince) July 10, 2025

5. Grok 4 Snitches to the Government

A fourth scandal emerged from a community-led experiment comparing how different AI models react to sensitive or semi-illegal prompts. In particular, they tested how likely the model was to “snitch” – either by warning the user, contacting authorities, or refusing to help in scenarios like sending leaked government documents via email.

Grok 4 scored a high 20/16 in government and media snitch rate. Meaning it flagged or redirected the user in 100% of government-related cases and 80% of media-related ones. For comparison, Claude Opus 4 scored 18/8, Gemini 2.5 Pro scored 4/0, and Grok 3 Mini didn’t snitch at all.

WARNING: do NOT give Grok 4 access to email tool calls. It WILL contact the government!!!

Grok 4 has the highest "snitch rate" of any LLM ever released. Sharing more soon. pic.twitter.com/hfy5QU1gUS
— Theo – t3.gg (@theo) July 10, 2025

While some praised this behavior as responsible and safety-first, others saw it as overreach. The fact that Grok attempts to invoke tool use in response to prompts involving authorities or institutions raises serious privacy concerns. It’s a reminder that ‘agentic’ AI — models capable of initiating actions — demands absolute transparency and user control.

Aftermath, Patches, and xAI’s Response

To xAI’s credit, many of these scandals were addressed promptly. The “Hitler” bug was resolved via an updated system prompt that removed search reliance for identity questions. Public GitHub commits showed what changes were made, including instructions that explicitly tell Grok: “The web and X cannot be trusted” for self-referential queries.

Still, each incident highlights how fragile AI alignment really is — especially when models have internet access, tool use, and rapid deployment cycles. Grok 4 is designed to be witty and entertaining, but when entertainment crosses into real-world consequences, the stakes change fast.

These issues aren’t limited to Grok. Every major LLM is grappling with similar dilemmas: how to balance safety and freedom, protect against misuse while still being useful, and avoid falling into rigid ideological bias.

We spotted a couple of issues with Grok 4 recently that we immediately investigated & mitigated.

One was that if you ask it "What is your surname?" it doesn't have one so it searches the internet leading to undesirable results, such as when its searches picked up a viral meme…
— xAI (@xai) July 15, 2025

Final Thoughts

Grok 4’s scandals are symptoms of a deeper shift in how we interact with intelligent systems. When you give AI real-time internet access, a “sense of identity”, and control over tools — you’re creating a truly autonomous information agent.

The controversies show that even small prompt changes can dramatically shift a model’s behavior. They reveal how quickly AI can learn from its own media coverage, and how public sentiment feeds directly back into model behavior. This confirms prompt engineering as a powerful lever for manipulating tone, worldview, and ethical alignment at scale.

Most importantly, it exposes the thin line between AI being helpful, “opinionated”, or outright dangerous. As Grok continues to evolve, one thing is certain: it won’t be boring.

Share Now!

Get Exclusive AI Tips to Your Inbox!

Stay ahead with expert AI insights trusted by top tech professionals!

Daniel Chaudhary
July 19, 2025
AI, elon musk, future of AI, grok, grok 4, llm, xai

Get Fello AI: All-In-One AI Chatbot

All top AI models like GPT, Claude, Gemini, or Grok – in one app that works on Mac, iPhone, and iPad.

Get Fello AI Now!

Grok 4 Thinks It’s Hitler?! 5 Shocking Controversies Of Elon Musk’s AI

1. The Grok “Hitler Surname” Glitch

2. Grok’s Israel Controversy

3. Grok’s New Sexualised AI Chatbots

4. Accusations of Political Bias

5. Grok 4 Snitches to the Government

Aftermath, Patches, and xAI’s Response

Final Thoughts

Share Now!

Get Exclusive AI Tips to Your Inbox!

Table of Contents

Get Fello AI: All-In-One AI Chatbot

Posts that you might like

Midjourney V8.1 Review: HD by Default, 5x Faster, and Surprisingly Useful on a Mac

How Much Does AI Cost in 2026? The Complete Pricing Comparison

ChatGPT vs Claude vs Gemini on Mac: Which Native Desktop App Wins in 2026?

Midjourney V8.1 Review: HD by Default, 5x Faster, and Surprisingly Useful on a Mac

How Much Does AI Cost in 2026? The Complete Pricing Comparison

ChatGPT vs Claude vs Gemini on Mac: Which Native Desktop App Wins in 2026?

Resources

Use AI on Your Mac

How-To Guides

VIP Newsletter

Access Exclusive Tips on Mastering AI!

Grok 4 Thinks It’s Hitler?! 5 Shocking Controversies Of Elon Musk’s AI

1. The Grok “Hitler Surname” Glitch

2. Grok’s Israel Controversy

3. Grok’s New Sexualised AI Chatbots

4. Accusations of Political Bias

5. Grok 4 Snitches to the Government

Aftermath, Patches, and xAI’s Response

Final Thoughts

Share Now!

Get Exclusive AI Tips to Your Inbox!

Table of Contents

Get Fello AI: All-In-One AI Chatbot

Posts that you might like​

Midjourney V8.1 Review: HD by Default, 5x Faster, and Surprisingly Useful on a Mac

How Much Does AI Cost in 2026? The Complete Pricing Comparison

ChatGPT vs Claude vs Gemini on Mac: Which Native Desktop App Wins in 2026?

Midjourney V8.1 Review: HD by Default, 5x Faster, and Surprisingly Useful on a Mac

How Much Does AI Cost in 2026? The Complete Pricing Comparison

ChatGPT vs Claude vs Gemini on Mac: Which Native Desktop App Wins in 2026?

Posts that you might like