Using Machine Learning Algorithms to Detect Fake News
DOI:
https://doi.org/10.47611/jsrhs.v11i4.3446Keywords:
Fake News, Yellow Journalism, Artificial Intelligence, Machine Learning, Support Vector Machine, Latent Dirichlet Allocation, Gradient Boosting, Boosting, Naive Bayes, ClassificationAbstract
Fake news has been a growing threat in the modern world. A major reason why fake news is so dangerous and effective is due to the difficulties of distinguishing it from correct news, if there was a way to detect fake news accurately, its negative impact could be significantly minimized. Previous studies have already found that fake news differentiated itself substantially from real news in terms of words used and the structure of the texts, implying the possibility of differentiation. One possible method of detecting fake news is Machine Learning. Utilizing artificial intelligence to detect patterns within the text of fake and real news articles. In this paper, we test the capability of the Machine Learning Algorithms in detecting fake news using four different types of models, SVM, Multinomial NB, Gradient Boosting, and Gradient Boosting with LDA. We find that all four models had a high success rate of over 90%, with the LDA+Gradient Boosting model performing the best, and Multinomial NB being the least successful. We also attempt to determine the topics that fake news tends to cover and found that fake news is often about politics. While the model has proven to be successful, we recommend that future testing be done on other datasets with greater variety in news sources.
Downloads
References or Bibliography
Ahmed, H., Traoré, I., & Saad, S. (2017). Detection of Online Fake News Using N-Gram Analysis and Machine Learning Techniques. ISDDC. https://doi.org/10.1007/978-3-319-69155-8_9
Ahmed, H., Traore, I., & Saad, S. (2018). Detecting opinion spams and fake news using text classification. Security and Privacy, 1(1), e9. https://doi.org/10.1002/spy2.9
Ali, M. (2020). PyCaret: An open source, low-code machine learning library in Python. https://www.pycaret.org
Bansal, H. (2020, November 25). Latent Dirichlet allocation. Medium. https://medium.com/analytics-vidhya/latent-dirichelt-allocation-1ec8729589d4
Bkkbrad. (2008, February 24). Latent Dirichlet allocation [Diagram]. Wikimedia Commons. https://commons.wikimedia.org/wiki/File:Latent_Dirichlet_allocation.svg
Harris, C. R., Millman, K. J., Van der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., Kern, R., Picus, M., Hoyer, S., Van Kerkwijk, M. H., Brett, M., Haldane, A., Del Río, J. F., Wiebe, M., Peterson, P., … Oliphant, T. E. (2020). Array programming with NumPy. Nature, 585(7825), 357-362. https://doi.org/10.1038/s41586-020-2649-2
Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). ACM. https://doi.org/10.1145/2939672.2939785.
Haag, M., & Salam, M. (2017, June 22). Gunman in ‘Pizzagate’ Shooting Is Sentenced to 4 Years in Prison. The New York Times - Breaking News, US News, World News and Videos. https://www.nytimes.com/2017/06/22/us/pizzagate-attack-sentence.html
Horne, B., & Adali, S. (2017). This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News. https://doi.org/10.48550/arXiv.1703.09398
Loper, E., & Bird, S. (2002). NLTK: The Natural Language Toolkit. CoRR,
https://doi.org/10.48550/arXiv.cs/0205028
.
Mishra, K. (2019, November 29). Machine learning : Bayes theorem. Medium. https://seeve.medium.com/machine-learning-bayes-theorem-2f48c33d51e5
OpenClipArt. (2014, September 4). SVM (Support Vector Machines) diagram vector image. FreeSVG. https://freesvg.org/svm-support-vector-machines-diagram-vector-image
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
https://doi.org/10.48550/arXiv.1201.0490
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A., & Gulin, A. (2019). CatBoost: unbiased boosting with categorical features. https://doi.org/10.48550/arXiv.1706.09516
Rubin, V., Conroy, N., Chen, Y., & Cornwell, S. (2016). Fake News or Truth? Using Satirical Cues to Detect Potentially Misleading News. In Proceedings of the Second Workshop on Computational Approaches to Deception Detection (pp. 7–17). Association for Computational Linguistics. https://doi.org/10.18653/v1/W16-0802
Thota, A., Tilak, P., Ahluwalia, S., & Lohia, N. (2018) Fake News Detection: A Deep Learning Approach. In SMU Data Science Review: Vol. 1: No. 3, Article 10. https://scholar.smu.edu/datasciencereview/vol1/iss3/10
Wineburg, S., McGrew, S., Breakstone, J., & Ortega, T. (2016, November 22). Evaluating information: The cornerstone of civic online reasoning. Stanford Digital Repository. https://purl.stanford.edu/fv751yt5934
World Health Organization. (2019). Ten health issues WHO will tackle this year. WHO | World Health Organization. https://www.who.int/news-room/spotlight/ten-threats-to-global-health-in-2019
Published
How to Cite
Issue
Section
Copyright (c) 2022 Cody; Nicole Lantz
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright holder(s) granted JSR a perpetual, non-exclusive license to distriute & display this article.