AI Study Finds Men Are Four Times More Represented In Literature Than Women – USC Viterbi


Clem Onojeghuo/Pexels

Characters like Jo March and Dana Franklin have been rare in literature.

Now, researchers from the USC Viterbi School of Engineering have used AI technologies to conclude that male characters are four times more common in literature than female characters like Jo and Dana.

Mayank Kejriwal, director of research at USC’s Information Science Institute (ISI), drew on ongoing work on implicit gender bias and his own expertise in natural language processing (NLP) . While many published studies investigate and analyze the qualitative aspects of female representation in literature and media, Kejriwal’s research has particularly made use of its strengths – the collection of quantitative data through existing machine learning algorithms.

To produce these results, Kejriwal and Nagaraj accessed the data through the Project Gutenberg Corpus which contains 3,000 books in English, a further attempt to alleviate scholarly bias. The genre of books ranged from adventure and science fiction to mystery and romance, and in a variety of mediums including novels, short stories, and poetry.

Akarsh Nagaraj, MS ’21, study co-author and machine learning engineer at Meta, helped uncover the literary 4:1 male-female imbalance.

“Gender bias is very real, and when we see four times fewer women in literature, it has a subliminal impact on people who consume culture,” said Kejriwal, assistant research professor in the department of industrial engineering and science. systems Daniel J Epstein. “We quantitatively revealed in an indirect way the prejudices that persist in the culture.”

Nagaraj noted the importance of how their methods and study results gave them a better understanding of biases in society and their implications. “Books are a window into the past, and the writing of these authors gives us insight into how people view the world and how it has changed.”

Men everywhere… and main characters

The study describes several methods for defining female prevalence in the literature. They used Named Entity Recognition (NER), a leading NLP method used to extract gender-specific characters. “One of the ways we define this is by looking at the number of female pronouns in a book versus male pronouns,” Kejriwal said. The other technique is to quantify the number of female characters who are the main characters.

This allowed the research team to determine if the male characters were central to the story.

The results of the study also showed that the gap between male and female characters decreases under female authorship. “It clearly showed us that in this era, women would represent themselves much more than a male writer,” Nagaraj said.

The team’s diverse methods for measuring and determining female representation in literature were not without limitations, however, when authors are neither male nor female. “When we published the paper on the dataset, reviewers criticized that we ignored non-dichotomous genders,” Kejriwal said. “But we agreed with them, in a way. We think it’s completely deleted, and we won’t be able to find much [transgender individuals or non-dichotomous individuals].”

Difficult dichotomies

Kejriwal acknowledged that AI tools to identify plural words, such as “they”, which can refer to a non-dichotomous individual, do not yet exist. Yet the study’s findings set the framework for addressing these social issues and creating the technologies that can address these deficits.

The study also provides a blueprint for future work on quantifying the qualitative results they uncovered through the study methodologies. Without the inherent bias of human-designed surveys, NLP technology also allowed them to find associations of adjectives with gendered traits, deepening their understanding of bias and its pervasiveness in society.

“Even with misattributions, the words associated with women were adjectives like ‘weak’, ‘lovable’, ‘pretty’ and sometimes ‘stupid’,” Nagaraj said. “For the male characters, words describing them included ‘leadership’, ‘power’, ‘strength’ and ‘politics’.”

Although the team ultimately did not quantify this facet of their study, this difference in qualitative descriptions between gender-specific characters provides a future possibility for more comprehensive qualitative investigation of word associations with gender.

“Our study shows us that the real world is complex but there are benefits for all the different groups in our society who participate in cultural discourse,” Kejriwal said. “When we do that, we tend to have a more realistic view of society.”

Kejriwal hopes the study will serve to highlight the importance of interdisciplinary research, that is, the use of AI technology to highlight pressing social issues and inequalities that can be addressed. Stakeholders with specialized training, including computer scientists, can offer tools to process data and answer questions, and policy makers can use this data to implement change.

Posted on April 22, 2022

Last updated on April 22, 2022


About Author

Comments are closed.