Bias Detection in Media: An NLP-Based Approach using Corpus Statistics and Sentence Embeddings
DOI:
https://doi.org/10.47611/jsrhs.v13i2.6587Keywords:
Bias, Topic Analysis, Corpus Statistics, Sentence EmbeddingsAbstract
In this paper, we implement a Natural Language Processing (NLP) solution for binary classification to categorize a sentence as biased or unbiased. Detecting bias is a challenge in the media today, but can be utilized to help readers identify which sources portray bias. The general approach to classifying a sentence as biased or unbiased involves representing words and sentences using probability or pretrained vectorization models. Our final model only contained probabilistic data about the connection between words, sentences, and each class. We used Pointwise Mutual Information (PMI) and Term Frequency Inverse Document Frequency (TF-IDF) as heuristics for finding the relationship between sentences and the biased and unbiased classes. We also leveraged Google’s Universal Sentence Encodings (USE) to capture the meaning of the sentences. Our results revealed a possible limitation in USE’s training data in terms of bias detection. Through topic analysis, we were able to uncover insights surrounding which topics are characterized by minimal bias. We were able to use these discoveries to contextualize the model’s performance.
Downloads
References or Bibliography
Bail, C.A.; Argyle, L.P.; Brown, T.W.; Bumpus, J.P.; Chen, H.; Hunzaker, M.F.; Lee, J.; Mann, M.; Merhout, F.; Volfovsky, A. Exposure to opposing views on social media can increase political polarization. Proceedings of the National Academy of Sciences of the United States of America 115(37), 9216–9221 (2018). https://www.doi.org/10.1073/pnas.1804840115
Nadeem, M.U.; Raza, S. "Detecting Bias in News Articles using NLP Models," Stanford CS224N Custom Project, Stanford University. https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1224/reports/custom_116661041.pdf
Cer, D.; Yang, Y.; Kong, S.; Hua, N.; Limtiaco, N.; St. John, R.; Constant, N.; Guajardo-Cespedes, M.; Yuan, S.; Tar, C.; Strope, B.; Kurzweil, R. Universal Sentence Encoder for English. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 169–174, Brussels, Belgium. Association for Computational Linguistics (2018). https://doi.org/10.48550/arXiv.1803.11175
Chao, Z.; Molitor, D.; Needell, D.; Porter, M.A. "Inference of Media Bias and Content Quality Using Natural-Language Processing," arXiv:2212.00237 (2022). https://doi.org/10.48550/arXiv.2212.00237
Spinde, T.; Rudnitckaia, L.; Sinha, K.; Hamborg, F.; Gipp, B.; Donnay, K. "MBIC–A media bias annotation dataset including annotator characteristics." In Proceedings of iConference 2021 (2021). https://doi.org/10.48550/arXiv.2105.11910
Published
How to Cite
Issue
Section
Copyright (c) 2024 Neeraj Gummalam, Clayton Greenberg
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright holder(s) granted JSR a perpetual, non-exclusive license to distriute & display this article.