Bias Detection in Media: An NLP-Based Approach using Corpus Statistics and Sentence Embeddings

Authors

  • Neeraj Gummalam Henry M. Gunn High School
  • Clayton Greenberg

DOI:

https://doi.org/10.47611/jsrhs.v13i2.6587

Keywords:

Bias, Topic Analysis, Corpus Statistics, Sentence Embeddings

Abstract

In this paper, we implement a Natural Language Processing (NLP) solution for  binary classification to categorize a sentence as biased or unbiased. Detecting bias is a challenge in the media today, but can be utilized to help readers identify which sources portray bias. The general approach to classifying a sentence as biased or unbiased involves representing words and sentences using probability or pretrained vectorization models.  Our final model only contained probabilistic data about the connection between words, sentences, and each class. We used Pointwise Mutual Information (PMI) and Term Frequency Inverse Document Frequency (TF-IDF) as heuristics for finding the relationship between sentences and the biased and unbiased classes. We also leveraged Google’s Universal Sentence Encodings (USE) to capture the meaning of the sentences. Our results revealed a possible limitation in USE’s training data in terms of bias detection. Through topic analysis, we were able to uncover insights surrounding which topics are characterized by minimal bias. We were able to use these discoveries to contextualize the model’s performance.

Downloads

Download data is not yet available.

References or Bibliography

Bail, C.A.; Argyle, L.P.; Brown, T.W.; Bumpus, J.P.; Chen, H.; Hunzaker, M.F.; Lee, J.; Mann, M.; Merhout, F.; Volfovsky, A. Exposure to opposing views on social media can increase political polarization. Proceedings of the National Academy of Sciences of the United States of America 115(37), 9216–9221 (2018). https://www.doi.org/10.1073/pnas.1804840115

Nadeem, M.U.; Raza, S. "Detecting Bias in News Articles using NLP Models," Stanford CS224N Custom Project, Stanford University. https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1224/reports/custom_116661041.pdf

Cer, D.; Yang, Y.; Kong, S.; Hua, N.; Limtiaco, N.; St. John, R.; Constant, N.; Guajardo-Cespedes, M.; Yuan, S.; Tar, C.; Strope, B.; Kurzweil, R. Universal Sentence Encoder for English. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 169–174, Brussels, Belgium. Association for Computational Linguistics (2018). https://doi.org/10.48550/arXiv.1803.11175

Chao, Z.; Molitor, D.; Needell, D.; Porter, M.A. "Inference of Media Bias and Content Quality Using Natural-Language Processing," arXiv:2212.00237 (2022). https://doi.org/10.48550/arXiv.2212.00237

Spinde, T.; Rudnitckaia, L.; Sinha, K.; Hamborg, F.; Gipp, B.; Donnay, K. "MBIC–A media bias annotation dataset including annotator characteristics." In Proceedings of iConference 2021 (2021). https://doi.org/10.48550/arXiv.2105.11910

Published

05-31-2024

How to Cite

Gummalam, N., & Greenberg, C. (2024). Bias Detection in Media: An NLP-Based Approach using Corpus Statistics and Sentence Embeddings. Journal of Student Research, 13(2). https://doi.org/10.47611/jsrhs.v13i2.6587

Issue

Section

HS Research Projects