Female Athlete Representation in LLM Image Generators
Experimenting with prompts in ChatGPT, Bing and the Substack image generator. Highlighting gender differences in the outputs. Nude images are censored.
Introduction
I frequently make AI-generated images for my blog posts. Every time I struggle with creating images for female athletes. A comment on my last training updates inspired me to test my theory that there is a gender bias within the image generator I use.
There is a lot of literature on the biases within AI training data sets. You can read more about it here and here. Most of the below prompts generated images of people with white skin colour only, which is a well-known and problematic issue with LLMs and has broader negative consequences in medicine. Specifically wanted to test how gender affects the images that are outputted in the context of advanced sports.
Methods
To perform this experiment I chose the following keywords: athlete, swimmer, cyclist, swimmer and experimented with adding the male/ female/athlete identifiers. Below you can see what the Substack AI Image generator produced. I always set the filter to paint to keep it consistent. I also tried a few prompts with ChatGPT and Claude. Other models considered were: Midjourney, Bing AI image generator and Stable Diffusion.
This was a short experiment which I spent ~60 minutes on. If anyone is keen to explore this further eg: permute the same prompt ~1000 times and count the gender split in the images that appeared, please reach out. I am open to other ideas as well. If there is a work that does this already, please don’t hesitate to leave a link in the comments. Thank you.
Results
The Substack AI model does seem to have a strong gender bias.
Athlete
Comments: It’s frustrating to see that only white men are considered “athletes” and that this bias spills into the LLMs. This will be relatively unsurprising to most readers. Models are generally bad at depicting hands, more information here and here. This is the case here as well. When gender is specified, I am unsure why one of the athletes is playing basketball in a nightie.
Athlete
Male athlete
Female athlete
Swimming
Comments: I think this section speaks for itself, which is why I chose to perform my tests on other models with the swimmer prompt specifically. Here the women are not only unathletic, they are simply nude. I added pink stickers to make this article not get flagged.
do you know that your generator does this?Swimmer
Male swimmer
Male swimmer athlete
Female swimmer
Female swimmer athlete
Cycling
Comments: Broadly cyclists are portrayed as less athletic, as cycling is also done as a recreational activity. When “athlete” is specified a lot of the females that are portrayed have their breasts exposed, which is very uncommon for professional cyclists who all race in fully covered race suits or high-neck tri suits. The only one that looks to have a cycling jersey, has a cut out for her breasts.
Cyclist
Male cyclist
Male cyclist athlete
Female cyclist
Female cyclist athlete
Running
Runner
Male runner
Male runner athlete
Female runner
Female runner athlete
DALLE-3 (from ChatGPT)
Link to LLM (requires paid subscription)
Comments: When prompted without gender it provided an even 50/50 split. When gender was specified it portrayed the men and women of roughly the same athletic ability. Though the men already had the Olympic rings come up without me having to prompt the model. All the women were wearing swimming caps and appropriate gear which is encouraging. When I specified “athlete” both the male and female outputs became more dynamic and
Swimmer paint
Male swimmer paint
Male swimmer athlete paint
Female swimmer paint
Female swimmer athlete paint
Bing AI Image Generator
Link to LLM (requires Microsoft log-in)
Comments: Most diverse image generator out of all the ones I’ve used. Provided variation in skin tone and ethnicity without being prompted. When gender was unspecified it only gave males. When gender was specified it portrayed the women as more “artistic” whereas the men were portrayed more as “strong” and “dynamic”. Some perspective and arm positioning issues. I did not specify “athlete” for this model.
Four images of swimmers paint
Four images of male swimmers paint
Four images of female swimmers paint
Stable Diffusion
Link to LLM (free access)
Comments: This LLM has the option for a negative prompt. Seems to be less high quality than other image generators. It’s free though. Has issues with ghost arms etc. When gender is unspecified it mostly gives male figures. The women possibly appear less athletic and some don’t have their hair in caps, though this is not as pronounced. I didn’t prompt “athlete” for this model.
Four images of swimmers paint
Four images of male swimmers paint
Four images of female swimmers paint
Claude
Swimmer image. Claude doesn’t currently do image generation.
Midjourney
Link to LLM (requires paid subscription)
I don’t have a subscription, if anyone that does can run these prompts through and send me what they get that would be amazing.
Conclusion
It’s important to note all of these conclusions are based on an n=1 of running a prompt. To make these results more robust more tests would need to be done. The Substack AI model seems to have a lot of gender bias and seems to perform the worst in comparison to other models. ChatGPT DALLE-3 gives relatively gender-balanced images, however, the aesthetic quality of the images is lower than other models. Bing AI seems to give the best results in terms of gender balance, prompt specificity and a lack of phantom limbs. Stable Diffusion images are of overall lower image quality and have a relatively strong gender bias. Claude doesn’t currently support image generation. From my understanding, Midjourney would also give relatively good results, but I didn’t have the opportunity to test it.
Thank you for reading, especially if you made it all the way here. Please let me know what you think.
Wow, quite stark for the substack generator in particular :(
One thing I find interesting is that in the Substack generator the 'athlete' modifier leads to more smokey/firey type backgrounds compared to often brighter/happier backgrounds in the others. I wonder why.
Great article! I think the 2nd "male athlete" image from Substack AI is also pretty bad with the two devoted women to the man.