High Performing Explanatory Fake News Classification on Longer Texts

Authors

  • Chelsea She Adlai E. Stevenson High School
  • Clayton Greenberg

DOI:

https://doi.org/10.47611/jsrhs.v12i4.5505

Keywords:

fake news detection, natural language processing, artificial intelligence

Abstract

After misinformation became prevalent in 2020, the research community started prioritizing creating state of the art (SOTA) fake news detectors. However, these models did little in changing user attitudes towards misinformation. Therefore, we try to increase trust between users and AI fake news detectors by implementing an explanatory moderator. We started with two research questions: (1) can long texts like normal news articles perform well in current fake news detectors meant for short texts, and (2) can we create a fake news detector that can achieve comparable high performances to SOTA fake news detectors while representing its classifications in explainable visualizations. To fulfill our first research question, we picked WELFake, a dataset containing news articles from four different news platforms. In order to create a comparable, SOTA fake news detector performance, we ran preliminary models of Majority Class Baseline, Random Forest Classifier with bag of words, and the third place model from the AAAI 2021 Shared Task: COVID-19 Fake News Detection in English competition with WELFake. Lastly, we fulfilled our second research question by making a manually fine-tuned BERT model to access attention masks that we could visualize through BertViz. Our manually fine-tuned BERT model outperformed our comparable, SOTA Two-Fold Four-Model ensemble with a 99.99% test accuracy. We made conclusions that current SOTA fake news detectors made for short texts can perform the same level of accuracy with long texts and explanatory fake news detectors can be comparable to current SOTA models. 

Downloads

Download data is not yet available.

References or Bibliography

Ahmed, H., Traore, I., & Saad, S. (2018). Detecting opinion spams and fake news using text classification. Security and Privacy, 1(1), e9. https://doi.org/10.1002/spy2.9

Kirchner, J., & Reuter, C. (2020). Countering fake news: A comparison of possible solutions regarding user acceptance and effectiveness. Proceedings of the ACM on Human-computer Interaction, 4(CSCW2), 1-27. https://doi.org/10.1145/3415211

Li, X., Xia, Y., Long, X., Li, Z., Li, S. (2021). Exploring Text-Transformers in AAAI 2021 Shared Task: COVID-19 Fake News Detection in English. In: Chakraborty, T., Shu, K., Bernard, H.R., Liu, H., Akhtar, M.S. (eds) Combating Online Hostile Posts in Regional Languages during Emergency Situation. CONSTRAINT 2021. Communications in Computer and Information Science, vol 1402. Springer, Cham. https://doi.org/10.1007/978-3-030-73696-5_11

Nguyen Vo and Kyumin Lee. (2021). Hierarchical Multi-head Attentive Network for Evidence-aware Fake News Detection. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 965–975, Online. Association for Computational Linguistics. 10.18653/v1/2021.eacl-main.83

Patwa, P. et al. (2021). Fighting an Infodemic: COVID-19 Fake News Dataset. In: Chakraborty, T., Shu, K., Bernard, H.R., Liu, H., Akhtar, M.S. (eds) Combating Online Hostile Posts in Regional Languages during Emergency Situation. CONSTRAINT 2021. Communications in Computer and Information Science, vol 1402. Springer, Cham. https://doi.org/10.1007/978-3-030-73696-5_3

Shu, K., Cui, L., Wang, S., Lee, D., & Liu, H. (2019, July). defend: Explainable fake news detection. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 395-405). https://doi.org/10.1145/3292500.3330935

Shu, K., Mahudeswaran, D., Wang, S., Lee, D., & Liu, H. (2020). FakeNewsNet: A Data Repository with News Content, Social Context, and Spatiotemporal Information for Studying Fake News on Social Media. Big data, 8(3), 171–188. https://doi.org/10.1089/big.2020.0062

Szczepański, M., Pawlicki, M., Kozik, R. et al. (2021). New explainability method for BERT-based model in fake news detection. Sci Rep 11, 23705. https://doi.org/10.1038/s41598-021-03100-6

Verma, P. K., Agrawal, P., Amorim, I., & Prodan, R. (2021). WELFake: word embedding over linguistic features for fake news detection. IEEE Transactions on Computational Social Systems, 8(4), 881-893. 10.1109/TCSS.2021.3068519

Vig, J. (2019, May). BertViz: A tool for visualizing multihead self-attention in the BERT model. In ICLR workshop: Debugging machine learning models (Vol. 23).

Published

11-30-2023

How to Cite

She, C., & Greenberg, C. (2023). High Performing Explanatory Fake News Classification on Longer Texts. Journal of Student Research, 12(4). https://doi.org/10.47611/jsrhs.v12i4.5505

Issue

Section

HS Research Projects