Bias Detection in Media: An NLP-Based Approach using Corpus Statistics and Sentence Embeddings

Neeraj Gummalam; Clayton Greenberg

doi:10.47611/jsrhs.v13i2.6587

Authors

Neeraj Gummalam Henry M. Gunn High School
Clayton Greenberg

DOI:

https://doi.org/10.47611/jsrhs.v13i2.6587

Keywords:

Bias, Topic Analysis, Corpus Statistics, Sentence Embeddings

PDF

Abstract

In this paper, we implement a Natural Language Processing (NLP) solution for binary classification to categorize a sentence as biased or unbiased. Detecting bias is a challenge in the media today, but can be utilized to help readers identify which sources portray bias. The general approach to classifying a sentence as biased or unbiased involves representing words and sentences using probability or pretrained vectorization models. Our final model only contained probabilistic data about the connection between words, sentences, and each class. We used Pointwise Mutual Information (PMI) and Term Frequency Inverse Document Frequency (TF-IDF) as heuristics for finding the relationship between sentences and the biased and unbiased classes. We also leveraged Google’s Universal Sentence Encodings (USE) to capture the meaning of the sentences. Our results revealed a possible limitation in USE’s training data in terms of bias detection. Through topic analysis, we were able to uncover insights surrounding which topics are characterized by minimal bias. We were able to use these discoveries to contextualize the model’s performance.

Downloads

References or Bibliography

Bail, C.A.; Argyle, L.P.; Brown, T.W.; Bumpus, J.P.; Chen, H.; Hunzaker, M.F.; Lee, J.; Mann, M.; Merhout, F.; Volfovsky, A. Exposure to opposing views on social media can increase political polarization. Proceedings of the National Academy of Sciences of the United States of America 115(37), 9216–9221 (2018). https://www.doi.org/10.1073/pnas.1804840115

Nadeem, M.U.; Raza, S. "Detecting Bias in News Articles using NLP Models," Stanford CS224N Custom Project, Stanford University. https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1224/reports/custom_116661041.pdf

Cer, D.; Yang, Y.; Kong, S.; Hua, N.; Limtiaco, N.; St. John, R.; Constant, N.; Guajardo-Cespedes, M.; Yuan, S.; Tar, C.; Strope, B.; Kurzweil, R. Universal Sentence Encoder for English. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 169–174, Brussels, Belgium. Association for Computational Linguistics (2018). https://doi.org/10.48550/arXiv.1803.11175

Chao, Z.; Molitor, D.; Needell, D.; Porter, M.A. "Inference of Media Bias and Content Quality Using Natural-Language Processing," arXiv:2212.00237 (2022). https://doi.org/10.48550/arXiv.2212.00237

Spinde, T.; Rudnitckaia, L.; Sinha, K.; Hamborg, F.; Gipp, B.; Donnay, K. "MBIC–A media bias annotation dataset including annotator characteristics." In Proceedings of iConference 2021 (2021). https://doi.org/10.48550/arXiv.2105.11910

Bias Detection in Media: An NLP-Based Approach using Corpus Statistics and Sentence Embeddings

Authors

DOI:

Keywords:

Abstract

Downloads

References or Bibliography

Published

How to Cite

Issue

Section

Announcements

Call for Papers: Volume 14 Issue 3

ARTICLES
PUBLISHED

STUDENT
AUTHORS

YEARS
OF SERVICE

Bias Detection in Media: An NLP-Based Approach using Corpus Statistics and Sentence Embeddings

Authors

DOI:

Keywords:

Abstract

Downloads

References or Bibliography

Published

How to Cite

Issue

Section

Announcements

Call for Papers: Volume 14 Issue 3

ARTICLESPUBLISHED

STUDENTAUTHORS

YEARSOF SERVICE

ARTICLES
PUBLISHED

STUDENT
AUTHORS

YEARS
OF SERVICE