Analyzing the Extent to which Gender Bias Exists in News Articles Using Natural Language Processing Techniques

Authors

  • Nihita Guda Broad Run High School

DOI:

https://doi.org/10.47611/jsrhs.v12i1.3865

Keywords:

natural language processing, artificial intelligence, news, ethics, machine learning, biases, gender bias

Abstract

Prior studies have shown the existence of gender bias in job postings, performance reviews, and letters of recommendation. However, very little research has been done on the presence of gender biases in mainstream news sources and how they vary across publications. Human editing, given the rapid pace of news dissemination, is not effective enough to address biases. Even computer programs that parse the news articles for specific words and references still fall short of identifying and detecting the undertones and implicit references, which is why sophisticated techniques like Artificial Intelligence (AI) are necessary.  In this study, I used Natural Language Processing (NLP) methods, a series of Python-programs to further analyze how biases vary in new information, along the metrics of type, variety, and intensity. I used over 500,000 news articles from 15 publications, spanning over 4 years to build and train my algorithm. Using Word2Vec, a popular NLP method, I was able to conclude that more right leaning publications are more likely to exhibit misogynistic content that is biased against women. However, the method fell short of identifying many forms of objectification like Benevolent Sexism. Similarly, using VADER, a python-code of sentiment analysis tool, I was able to determine that mere metrics of positive, negative, and neutral sentiment are not sufficient to detect occurrences of gender bias. To gauge the breadth of sexist language effectively, I used the LIWC text analysis program which calculates the percentage of words in a given text that fall into one or more of over 80 linguistic, psychological and topical categories indicating various social, cognitive, and affective processes. As a result, with statistical evidence my study was able to conclude the presence of implicit gender bias occurs all across publications but is more prevalent in right-leaning publications.

Downloads

Download data is not yet available.

References or Bibliography

Allen, C. & Hospedales, T. (n.d.). Analogies explained: Towards understanding word

embeddings. Retrieved June 1, 2022, from https://arxiv.org/pdf/1901.09813.pdf

Beri, A. (2020, May 27). Sentimental analysis using VADER. Medium. Retrieved May 31, 2022,

from https://towardsdatascience.com/sentimental-analysis-using-vader-a3415fef7664

D'Ignazio, C. & Klein, L. F. (2020). Data feminism. The MIT Press.

Honnibal, M., & Montani, I. (2017). spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing.

Hutto, C.J. & Gilbert, Eric. (2015). VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. Proceedings of the 8th International Conference on Weblogs and Social Media, ICWSM 2014.

Madera, J. M., Hebl, M. R., Dial, H., Martin, R. & Valian, V. (2018). Raising doubt in letters of

recommendation for academia: Gender differences and their impact. Journal of Business and Psychology, 34(3), 287–303. https://doi.org/10.1007/s10869-018-9541-1

Mastari, L., Spruyt, B. & Siongers, J. (2019). Benevolent and hostile sexism in social spheres:

The impact of parents, school and romance on Belgian adolescents' sexist attitudes. Frontiers in Sociology, 4(47). https://doi.org/10.3389/fsoc.2019.00047

Nittle, N. (2021, April 28). The language of gender bias in performance reviews. Stanford

Graduate School of Business. Retrieved May 31, 2022, from https://www.gsb.stanford.edu/insights/language-gender-bias-performance-reviews

Noble, S. U. (2018). Algorithms of oppression: How search engines reinforce racism. New York

University Press.

Salmen, A. & Dhont, K. (2020). Hostile and benevolent sexism: The differential roles of human supremacy beliefs, women’s connection to nature, and the dehumanization of women. Group Processes & Intergroup Relations, 24(7), 1053–1076. https://doi.org/10.1177/1368430220920713

Vatsal. (2022, May 20). Word2Vec explained. Medium. Retrieved May 31, 2022, from

https://towardsdatascience.com/word2vec-explained-49c52b4ccb71

Published

02-28-2023

How to Cite

Guda, N. (2023). Analyzing the Extent to which Gender Bias Exists in News Articles Using Natural Language Processing Techniques. Journal of Student Research, 12(1). https://doi.org/10.47611/jsrhs.v12i1.3865

Issue

Section

HS Research Projects