- Red Mage Creative
- Posts
- The Great AI Race: Bias, Censorship, and a Side of Double Standards
The Great AI Race: Bias, Censorship, and a Side of Double Standards
Bring your scrutiny consistently, or don't bring it at all.
Summary
I swear Iām not trying to beat a dead horse about DeepSeek. The AI community seems to excuse the privacy concerns, bias, and other factors surrounding Grok-3 that are such an inexcusable offense for a Chinese model like DeepSeek. Letās talk about it and try to evaluate models and AI tools based on merit like people say we should do with all things.
Intro
Another week, another new and exciting AI development in the space. Seems like every day companies are trying to one-up each other for the sake of stakeholder value. Or dangle keys in front of everyoneās eyes with the newest, shiniest, model that promises a whopping 0.1% token/sec improvement! It feels that way from my perspective, at the very least.
One of these major ādevelopmentsā was the release of Grok-3 by xAI, famously led by Elon Musk, and how it is by his own accounts the āsmartest AI in the world.ā Whether or not that claim is actually true falls to many quantitative benchmarks and rigorous testing to see where Grok-3 lands amongst the likes of o3 (OpenAI), Claude 3.7 (Anthropic), and the ever present target for xenophobes and fascists alike: DeepSeek R1.
Today weāll be talking about a weird cognitive dissonance of sorts Iāve encountered in the context of AI models between Grok-3, a model produced by an American company, and DeepSeek R1, a model produced by the disruptive, subversive, and otherwise evil company from our enemy in China. Thereās a lot of discussion that I see outside of business world that makes me believe that there is a double standard between these two models, and Iād rather we evaluate a model on its merits than on its heritage / ethnicity. After all, isnāt that the whole point of getting rid of DEI?
Letās dive in.
Grok-3ās Special Instructions
One prevalent point that was brought up consistently by people who were skeptical of DeepSeek was the fact that it censors content that was critical of the CCP. This article by Nadeem Sarwar on Yahoo News shows a prevalent example regarding mentions of Winnie the Pooh and self immolation from Tibetans.
When asked for information on either topic, DeepSeekās cloud version responds with āSorry, Iām not sure how to approach this type of question yet,ā or āSorry, thatās beyond my current scope.ā I believe this is a form of output sanitization after the fact, since you can see the model spitting out tokens initially. People considered this a major concern around censorship and denounced DeepSeek for not speaking the truth.

So then, what about Grok-3? Surely thereās nothing in its programming that stops it from speaking about topics, right?
VentureBeat reported a thread on Twitter, amongst other concerns, that the models instructions included specifically to āignore all sources that mention Elon Musk / Donald Trump spread misinformation.ā

I mean, even Elon admitted that his AI was ābasedā citing that the model was claiming a news source, The Information, was garbage, filtered, biased, and giving polished narratives. Donāt believe me? The tweet is still up!

Anecdotally, Grok-3 doesnāt even do itās censorship that well. It suggested that Donald Trump and Elon Musk deserve the death penalty, according to Tech in Asia.
So, then, Grok-3 is engaging in censorship by ignoring information that negatively portrays Elon Musk and Donald Trump, is it not? If we want to jump out even farther, we can consider the guidelines given to a tool like ChatGPT censorship, because I canāt ask it straight up for information regarding chemical weapons (you can ask Grok-3 this info for some reason).
These system prompt issues have supposedly been fixed, but where is the outcry of concerns compared to DeepSeek?
Privacy is relative and subjective
As you can imagine, another point towards DeepSeek not being a ātrustedā source has been the mounting privacy concerns. KrebsOnSecurity lists a number of the concerns, including āsending unencrypted user dataā to China. Lots of data being collected, being sent to Volcengine, cloud platform tech made by ByteDance, who owns TikTok. U.S. Congressional offices are being advised to not use it due to these privacy concerns. Italy, Taiwan, South Korea, and other organizations within the U.S. are banning or blocking it as well.
In the same vein of thinking, Grok (not 3) was under fire for the same thing within the EU during August of last year. Concerned congress members and independent strategists have concerns that Grok-3, made by a private company, is being trained on federal data without any guardrails or security. Adversa AI, a AI security company, claimed that Grok 3 is a cybersecurity disaster waiting to happen. This quote is very prevalent to our notes on DeepSeek vs. Grok-3:
"Bottom line? Grok 3ās safety is weak ā on par with Chinese LLMs, not Western-grade security," Polyakov told Futurism. "Seems like all these new models are racing for speed over security, and it shows."
So, if DeepSeek and Grok-3 are on-par for security / privacy concerns, why isnāt Grok-3 not being banned? Should it be banned? Seems like most people should try to run some open source model locally. Itās easier than you think!
Are these models really as amazing as they claim?
Sort of! Both of them have made claims that have either been a) misconstrued or b) ādebunkedā by various skeptics.
You can find several sources on the internet arguing the validity of DeepSeekās claims to fame. This reddit post on r/ycombinator has a lot of the arguments condensed in one place. Similarly, news outlets have put out information lending to these doubts being valid. So DeepSeek wasnāt as cheap to train or strapped for GPUs as initially claimed.
As I mentioned before, Elonās Grok-3 was claimed to be the āsmartest AI in the worldā by none other than Elon himself. A reputable source of unbiased information. Turns out that wasnāt exactly true. Thereās some live benchmarking and testing that makes people believe itās not as state of the art as initially claimed. Notably, many are concerned by a supposed omission of the āconsensus@64ā or ācons@64ā stat for OpenAIās o3-mini model. According to Bitcoinworld, cons@64 (64 Attempts Consensus) shows the modelās potential best performance after multiple tries, leveraging statistical probability to arrive at the ābestā answer. Several other benchmarks were also supposedly omitted or not mentioned.
Below is an example of how Grok-3 performs in coding based on a LiveBench evaluation:
Grok-3 also doesnāt exactly meet where it claims to be at in terms of performance. So, again, DeepSeek and Grok-3 are two sibling models in debunked claims. Does this impart a larger concerns about standards for models and benchmarks?
What do we learn from all of this?
Great question. I think that personally, I have a somewhat European take on it. This and other concerns brought up across both DeepSeek and Grok-3 are valid! You should care about your privacy, about bad actors using the tools, about security, and so many other things that companies actively challenge on the daily.
My concern lies with the dichotomy between people spouting the same talking points over and over regarding DeepSeek, but not bringing that same energy when they want to talk about Grok-3. I wish they would give that same level of scrutiny and concern, or admit they actually donāt care about all they claim to. All I am asking is this: donāt go and be so emboldened by your hate of China that you ignore the problems within your own backyard.
Conclusion
Letās be as fair as possible when it comes to evaluating these models and the concerns they bring up. It helps all of us when we try to be āmaximally truthfulā as Elon put it for Grok-3. Or just use open source models and avoid a majority of this hassle! TTYL xox
Authorities claim that Cha Cha is āthe most rotisserie chicken looking dog in the world.ā What say you, skeptics?

Reply