Twitter Sentiment Analysis Using Machine Learning

Authors

  • Srimayi Gupta Mountain House High School
  • Padmavathy Jawahar

DOI:

https://doi.org/10.47611/jsrhs.v13i2.6819

Keywords:

artificial intelligence, Twitter, Sentiment analysis, natural language processing, machine learning, sklearn

Abstract

In an age of social media, online forums, and chats, cyberbullying is a prevalent issue. On Twitter (now X), approximately 500 million tweets are shared per day (Antonakaki et.al., 2021). It is the job of the moderators to ensure these tweets follow standard community guidelines. However, the sheer number of tweets makes it difficult to sort manually and ensure they are following protocol. Sentiment analysis and machine learning algorithms can be used to classify these texts automatically as positive or negative. Normally, these machine learning models are much more efficient and may provide higher accuracy rates in identifying hate speech in Twitter. In this paper, we are exploring the use of five classical machine learning algorithms to classify Twitter hate speech as neutral, racist, or sexist. Model performance was compared after using raw tweet data versus pre-processed tweets through data cleanup. Furthermore, we highlight two methods to deal with imbalanced datasets to improve the prediction rates. Overall, we were able to achieve a 96% accuracy in correctly classifying tweets into the different labels.

Downloads

Download data is not yet available.

Author Biography

Padmavathy Jawahar

AP Computer Science Principles teacher at Mountain House High School.

References or Bibliography

Antonakaki, D., Fragopoulou, P., & Ioannidis, S. (2021). A survey of Twitter research: Data model, graph structure, sentiment analysis and attacks. Expert Systems with Applications, 164, 114006. https://doi.org/10.1016/j.eswa.2020.114006

Giachanou, A., & Crestani, F. (2016). Like It or Not. ACM Computing Surveys, 49(2), 1–41. https://doi.org/10.1145/2938640

1. Linear Models — scikit-learn 0.22.2 documentation. (n.d.). Scikit-Learn.org. https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression

6. Nearest Neighbors — scikit-learn 0.21.3 documentation. (2019). Scikit-Learn.org. https://scikit-learn.org/stable/modules/neighbors.html

Scikit-learn. (2019). 1.9. Naive Bayes — scikit-learn 0.21.3 documentation. Scikit-Learn.org. https://scikit-learn.org/stable/modules/naive_bayes.html

11. Ensemble methods. (n.d.). Scikit-Learn. https://scikit-learn.org/stable/modules/ensemble.html#random-forests

Google Developers. (2019, March 5). Classification: Precision and Recall - Google Developers.

https://developers.google.com/machine-learning/crash-course/classification/precision-and-recall

Nothman, J., Qin, H., & Yurchak, R. (2018). Stop Word Lists in Free Open-source Software Packages. Proceedings of Workshop for NLP Open Source Software (NLP-OSS). https://doi.org/10.18653/v1/w18-2502

Willett, P. (2006). The Porter stemming algorithm: then and now. Program, 40(3), 219–223. https://doi.org/10.1108/00330330610681295

Khyani, Divya & B S, Siddhartha. (2021). An Interpretation of Lemmatization and Stemming in Natural Language Processing. Shanghai Ligong Daxue Xuebao. Journal of University of Shanghai for Science and Technology. 22. 350-357.

sklearn.feature_extraction.text.TfidfTransformer — scikit-learn 0.23.1 documentation. (n.d.). Scikit-Learn.org. https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfTransformer.html

Vargas, V., Aranda, J., Costa, R., Pereira, P., & Luis, J. (2022). Imbalanced data preprocessing techniques for machine learning: a systematic mapping study. 65(1), 31–57. https://doi.org/10.1007/s10115-022-01772-8

Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Müller, A., Grisel, O., Niculae, V., Prettenhofer, P., Gramfort, A., Grobler, J., Layton, R., Vanderplas, J., Joly, A., Holt, B., & Varoquaux, G. (2013). API design for machine learning software: experiences from the scikit-learn project.

Published

05-31-2024

How to Cite

Gupta, S., & Jawahar, P. (2024). Twitter Sentiment Analysis Using Machine Learning. Journal of Student Research, 13(2). https://doi.org/10.47611/jsrhs.v13i2.6819

Issue

Section

HS Research Projects