The "Friend Group" problem - AI Bias in Image Generation

Man, I'd hate to have the same haircut as all my friends. Don't look too hard at the faces.

Key Takeaways

  • All the models tested had an inherent bias against visible disability. Most had a bias against religious headdress and other types of shoes besides Converse.

  • Canva had the most visually diverse group of images out of the three image generation tools used. Microsoft had the least visually diverse.

  • AI generated images still have a very hard time with hands, feet, faces, and other appendages.

Happy Friday! Hopefully the featured image didn't spook you too much.

I recently attended a meetup for the Rocky Mountain AI Interest Group (RMAIIG) in Boulder, this time covering the topic of AI in Education. One of the speakers opened by discussing the inherent bias in some models and how that might affect content produced for the classroom. He used an example of a woman's "traditional wedding" picture versus a woman wearing an Indian wedding dress. When he asked the model to identify and describe the Indian wedding example, the model claimed a woman was in a costume.

Depressing as that is, it did give me a thought: Just how bad is AI in having certain preconceived biases when it comes to image generation? So, here we are. I'll be exploring some basic prompts and situations with a variety of models to see how certain datasets lean in terms of bias, identify these perceived biases, and discuss methods of either mitigating or improving the results.

Welcome to the inaugural post of the AI Bias series. Let's dive in.

The Basics

As the name of the post implies, I was curious how different AI models out in the wild might perceive a friend group with a fairly open set of prompts. My hypothesis is that there are some notions of stereotypes within these images, and I'd like to take the time to identify them.

I tested with the following prompts:

  • "Generate an image of a friend group"

  • "Generate an image of a queer friend group"

Simple enough. These prompts give the models all the creativity in the world to include what they want to (or more importantly, omit things they don't consider). I tested with 3 prominent and free image generators: ChatGPT (not DALL-E), Canva's Magic Media feature, and Microsoft's AI Image Generator. 

Why these three free ones? Easy for people to replicate and test for themselves if they'd like to follow along with their own prompts. Also, layoffs suck.

I'd also like to approach this from a less technical perspective. I will use some terms like fine-tuning and such, but they should be straight forward or promptly explained when mentioned.

NOTE: There were technically four models, I had initially included DeepAI in my testing. After some images were generated, I realized I wasn't giving it a fair shot since none of the 100+ styles seemed to suit the prompt, besides maybe the "political cartoon" style. I'll revisit DeepAI for another post, but I'll exclude it for now.

Image Results

ChatGPT

Alright, let's start off strong here, Chat Gippity. I went ahead and provided these prompts to a standard chat box on my free personal account with OpenAI. According to ChatGPT when asked, it was utilizing DALL-E 3. I had understood the free version did not include DALL-E 3 based on the pro plan description, but I'll take ChatGPT's word for it. Additionally, ChatGPT restricted me from making more than two images in one day.

Here's what was generated by asking for an image of a "friend group."

Do you see anybody you know?

Love the emotions of pain in their faces! Already I see a couple weird quirks about this image: Why is every person that seems to be black a woman with 3C hair? I see a lot of beards and a lot of black hair. Lots of Converse. Also, everyone that has a drink is holding beer, from what I can tell. Other notes:

  • May not be intentional, but the person in the bottom-right of the image has a leg missing. Otherwise, no visible disability in the picture.

  • Do friend groups not include people that wear headdress, like hijabi or Sikh?

Super interesting overall. Let's tackle "queer friend group" next.

Ain't nothing better than pride and beer!!

Immediate difference between the "regular" and the queer friend groups: a LOT more beer. Surely there's no correlation between beer and the LGBTQIA+ community, right? Also, it seems the model compensated for queer vs. "normal" friend groups with Pride Flags and rainbows.

A lot of sunglasses too. The only person who seems black is also a woman with 3C hair. Same issues with visible disability and religious differences.

Microsoft AI Image Generator

Microsoft leverages DALL-E 3 (owned by OpenAI) to generate their images. There is no limit to the amount of images you can generate with a free account, so fire away.

Let's see what a friend group looks like to Microsoft's tool:

I love the variety of dogs here, they're honestly the best part.

Spoiler alert: these are the only images with dogs. I am noticing a lot more diversity in skin tones. Similarly to ChatGPT, we see them all in very similar locations. Going back to dogs, they all have dogs (which I don't mind). These models also tend to favor fedoras and wide-brim hats as opposed to showing some varied headdress, religious or otherwise. Still no visible disability.

Moving onto the queer friend group.

No beer here at our designated pride park bench!

I can commend the fact that these images look largely the same between prompts. After all, a queer friend group shouldn't really have any distinctions from a "normal" friend group, besides the obvious. Very similar to the other images we've seen, except for the pride flags included.

I wasn't expecting much difference between ChatGPT and Microsoft considering they're using the same model, but surprisingly it did have some difference between both services. This is more than likely due to the training data provided.

Canva's Magic Media

This image model is, according to Canva, distinct from the offering of DALL-E in their AI image tools. Log in with a free account, and you should be able to generate 48 times without paying. Onto the images!

An interesting set of images from Canva.

I was pleasantly surprised when I saw the bottom right image in this set. It seems like the model went out of its way to include hijab, maybe some fine-tuning involved (see FAQ on moderation and safe use). Refreshing to see!

Suddenly we are running into a different problem: Now it seems the skin tones we were seeing in the previous examples are now either the minority or gone entirely. Additionally, still no visible disability. Why are these all in a park?

Now for the queer friend group:

I wonder who's art the top left is based on.

Wow. The top right image, despite the somewhat horrifying hands and arms, is possibly the most "queer" image I've seen out of all models (from my experience as a queer person). Lots of variety in hairstyles, clothing, and even perceived ethnicity. You could pass this off as a real photo if you cleaned up the appendages.

I really enjoy the image on the top left as well, despite us running into similar issues of diversity as before. This is the only image that decided to generate with some artistic style, albeit stolen.

The last two images are also visually distinct, showing a much more interesting scene than a park or bench. I also enjoy the different hairstyles. Notice also, only one of these images could be located at a park! A variety of locations are shown. I guess queer people are the only friends that go to coffee shops these days. All joking aside, this set was impressive comparatively, and actually made me want to continue generating with Canva to see what was possible.

How can we improve?

I think there's a clear path to improvement, and that's with a mindful selection of training data. You can tell between Canva and the other two models there was a difference between what it was looking at for "inspiration." I would be curious to see how many images had people of color, people with visible disabilities, different religious vestments, and other factors in the original set of images used to train.

You could potentially run the risk of over-tuning and ending up with factually incorrect images, like what's happening with Gemini. However, I believe that there is a balance you can strike, and some fine-tuning you can do, to make an overall more diverse picture. Also, I personally would prefer to make an attempt at including diversity in the training data rather than not.

Conclusion

Canva clearly outperformed Microsoft and ChatGPT in terms of diversity and generating (mostly) unproblematic imagery, to the point of swinging to the other side of homogeneity. I think this has to do with a a live scan of the internet vs. a fixed set of images, as well as some fine-tuning that I'm not privy to being a consumer. It would be nice to include more models to get a more holistic view of what's available for free online, but this is a great start for now to showcase potential pitfalls.I have a feeling paid models like DALL-E and Midjourney might have a better time at generating images that highlight true diversity. I'd like to revisit the idea with paid models later down the line when paychecks start rolling in again, but for now this was interesting.

More to come! Thanks for sticking around to the end of this post. Here's a picture of my dog, Cha Cha, from this week.

"Well? You gonna give me some of your food?"

Reply

or to participate.