Comparing Bag-of-Words, SBERT, and GPT-3 for Bias Detection

Max Luo; Clayton Greenberg

doi:10.47611/jsr.v13i2.2471

Authors

Max Luo Lynbrook High School
Clayton Greenberg

DOI:

https://doi.org/10.47611/jsr.v13i2.2471

PDF

Keywords:

Bias, BERT, SBERT, OpenAI, GPT-3

Abstract

This project aims to detect bias in media by training a machine learning model to recognize biased sentences. We did this by using a dataset containing 3700 sentences each annotated by multiple experts. The approaches we used were bag-of-words, SBERT, and GPT-3. For the bag-of-words and SBERT models, we generated prototype vectors for each class and used cosine similarity to classify sentences. For GPT-3, we used the OpenAI API's fine-tune function to train a model on the dataset, with the prompt being a sentence and the completion representing a class. The bag-of-words, SBERT, and GPT models achieved F-scores of 0.614, 0.819, and 0.838 respectively. We concluded that GPT-3 is the most accurate model while SBERT is the best model for a real-world application.

Downloads

Metrics

PDF views

236

References or Bibliography

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess,

Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language models are few-shot learners. In 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada.

Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Brian Strope, and Ray Kurzweil. 2018. Universal sentence encoder for English. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 169–174, Brussels, Belgium. Association for Computational Linguistics.

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.

Lalitha Kameswari, Dama Sravani, and Radhika Mamidi. 2020. Enhancing bias detection in political news using pragmatic presupposition. In Proceedings of the Eighth International Workshop on Natural Language Processing for Social Media, pages 1–6, Online. Association for Computational Linguistics.

Tomas Mikolov, Kai Chen, Greg S. Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In ICLR 2013.

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.

Nils Reimers and Iryna Gurevych. 2019. SentenceBERT: Sentence embeddings using Siamese BERTnetworks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China. Association for Computational Linguistics.

Timo Spinde, Manuel Plank, Jan-David Krieger, Terry Ruas, Bela Gipp, and Akiko Aizawa. 2021. Neural media bias detection using distant supervision with BABE - bias annotations by experts. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 1166–1177, Punta Cana, Dominican Republic. Association for Computational Linguistics.

Esther van den Berg and Katja Markert. 2020. Context in informational bias detection. In Proceedings of the 28th International Conference on Computational Linguistics, pages 6315–6326, Barcelona, Spain (Online). International Committee on Computational Linguistics.

Comparing Bag-of-Words, SBERT, and GPT-3 for Bias Detection

Authors

DOI:

Keywords:

Abstract

Downloads

Metrics

References or Bibliography

Published

How to Cite

Issue

Section

Announcements

Call for Papers: Volume 14 Issue 3

ARTICLES
PUBLISHED

STUDENT
AUTHORS

YEARS
OF SERVICE

Comparing Bag-of-Words, SBERT, and GPT-3 for Bias Detection

Authors

DOI:

Keywords:

Abstract

Downloads

Metrics

References or Bibliography

Published

How to Cite

Issue

Section

Announcements

Call for Papers: Volume 14 Issue 3

ARTICLESPUBLISHED

STUDENTAUTHORS

YEARSOF SERVICE

ARTICLES
PUBLISHED

STUDENT
AUTHORS

YEARS
OF SERVICE