• Red Mage Creative
  • Posts
  • The Great AI Race: Bias, Censorship, and a Side of Double Standards

The Great AI Race: Bias, Censorship, and a Side of Double Standards

Bring your scrutiny consistently, or don't bring it at all.

Summary

I swear I’m not trying to beat a dead horse about DeepSeek. The AI community seems to excuse the privacy concerns, bias, and other factors surrounding Grok-3 that are such an inexcusable offense for a Chinese model like DeepSeek. Let’s talk about it and try to evaluate models and AI tools based on merit like people say we should do with all things.

Intro

Another week, another new and exciting AI development in the space. Seems like every day companies are trying to one-up each other for the sake of stakeholder value. Or dangle keys in front of everyone’s eyes with the newest, shiniest, model that promises a whopping 0.1% token/sec improvement! It feels that way from my perspective, at the very least.

One of these major ā€œdevelopmentsā€ was the release of Grok-3 by xAI, famously led by Elon Musk, and how it is by his own accounts the ā€œsmartest AI in the world.ā€ Whether or not that claim is actually true falls to many quantitative benchmarks and rigorous testing to see where Grok-3 lands amongst the likes of o3 (OpenAI), Claude 3.7 (Anthropic), and the ever present target for xenophobes and fascists alike: DeepSeek R1.

Today we’ll be talking about a weird cognitive dissonance of sorts I’ve encountered in the context of AI models between Grok-3, a model produced by an American company, and DeepSeek R1, a model produced by the disruptive, subversive, and otherwise evil company from our enemy in China. There’s a lot of discussion that I see outside of business world that makes me believe that there is a double standard between these two models, and I’d rather we evaluate a model on its merits than on its heritage / ethnicity. After all, isn’t that the whole point of getting rid of DEI?

Let’s dive in.

Grok-3’s Special Instructions

One prevalent point that was brought up consistently by people who were skeptical of DeepSeek was the fact that it censors content that was critical of the CCP. This article by Nadeem Sarwar on Yahoo News shows a prevalent example regarding mentions of Winnie the Pooh and self immolation from Tibetans.

When asked for information on either topic, DeepSeek’s cloud version responds with ā€œSorry, I’m not sure how to approach this type of question yet,ā€ or ā€œSorry, that’s beyond my current scope.ā€ I believe this is a form of output sanitization after the fact, since you can see the model spitting out tokens initially. People considered this a major concern around censorship and denounced DeepSeek for not speaking the truth.

So then, what about Grok-3? Surely there’s nothing in its programming that stops it from speaking about topics, right?

VentureBeat reported a thread on Twitter, amongst other concerns, that the models instructions included specifically to ā€œignore all sources that mention Elon Musk / Donald Trump spread misinformation.ā€

I mean, even Elon admitted that his AI was ā€œbasedā€ citing that the model was claiming a news source, The Information, was garbage, filtered, biased, and giving polished narratives. Don’t believe me? The tweet is still up!

Anecdotally, Grok-3 doesn’t even do it’s censorship that well. It suggested that Donald Trump and Elon Musk deserve the death penalty, according to Tech in Asia.

So, then, Grok-3 is engaging in censorship by ignoring information that negatively portrays Elon Musk and Donald Trump, is it not? If we want to jump out even farther, we can consider the guidelines given to a tool like ChatGPT censorship, because I can’t ask it straight up for information regarding chemical weapons (you can ask Grok-3 this info for some reason).

These system prompt issues have supposedly been fixed, but where is the outcry of concerns compared to DeepSeek?

Privacy is relative and subjective

As you can imagine, another point towards DeepSeek not being a ā€œtrustedā€ source has been the mounting privacy concerns. KrebsOnSecurity lists a number of the concerns, including ā€œsending unencrypted user dataā€ to China. Lots of data being collected, being sent to Volcengine, cloud platform tech made by ByteDance, who owns TikTok. U.S. Congressional offices are being advised to not use it due to these privacy concerns. Italy, Taiwan, South Korea, and other organizations within the U.S. are banning or blocking it as well.

"Bottom line? Grok 3’s safety is weak — on par with Chinese LLMs, not Western-grade security," Polyakov told Futurism. "Seems like all these new models are racing for speed over security, and it shows."

So, if DeepSeek and Grok-3 are on-par for security / privacy concerns, why isn’t Grok-3 not being banned? Should it be banned? Seems like most people should try to run some open source model locally. It’s easier than you think!

Are these models really as amazing as they claim?

Sort of! Both of them have made claims that have either been a) misconstrued or b) ā€œdebunkedā€ by various skeptics.

You can find several sources on the internet arguing the validity of DeepSeek’s claims to fame. This reddit post on r/ycombinator has a lot of the arguments condensed in one place. Similarly, news outlets have put out information lending to these doubts being valid. So DeepSeek wasn’t as cheap to train or strapped for GPUs as initially claimed.

As I mentioned before, Elon’s Grok-3 was claimed to be the ā€œsmartest AI in the worldā€ by none other than Elon himself. A reputable source of unbiased information. Turns out that wasn’t exactly true. There’s some live benchmarking and testing that makes people believe it’s not as state of the art as initially claimed. Notably, many are concerned by a supposed omission of the ā€œconsensus@64ā€ or ā€œcons@64ā€ stat for OpenAI’s o3-mini model. According to Bitcoinworld, cons@64 (64 Attempts Consensus) shows the model’s potential best performance after multiple tries, leveraging statistical probability to arrive at the ā€˜best’ answer. Several other benchmarks were also supposedly omitted or not mentioned.

Below is an example of how Grok-3 performs in coding based on a LiveBench evaluation:

Grok-3 also doesn’t exactly meet where it claims to be at in terms of performance. So, again, DeepSeek and Grok-3 are two sibling models in debunked claims. Does this impart a larger concerns about standards for models and benchmarks?

What do we learn from all of this?

Great question. I think that personally, I have a somewhat European take on it. This and other concerns brought up across both DeepSeek and Grok-3 are valid! You should care about your privacy, about bad actors using the tools, about security, and so many other things that companies actively challenge on the daily.

My concern lies with the dichotomy between people spouting the same talking points over and over regarding DeepSeek, but not bringing that same energy when they want to talk about Grok-3. I wish they would give that same level of scrutiny and concern, or admit they actually don’t care about all they claim to. All I am asking is this: don’t go and be so emboldened by your hate of China that you ignore the problems within your own backyard.

Conclusion

Let’s be as fair as possible when it comes to evaluating these models and the concerns they bring up. It helps all of us when we try to be ā€œmaximally truthfulā€ as Elon put it for Grok-3. Or just use open source models and avoid a majority of this hassle! TTYL xox

Authorities claim that Cha Cha is ā€œthe most rotisserie chicken looking dog in the world.ā€ What say you, skeptics?

Reply

or to participate.