Twitter Sentiment Analysis Using Machine Learning

Srimayi Gupta; Padmavathy Jawahar

doi:10.47611/jsrhs.v13i2.6819

Authors

Srimayi Gupta Mountain House High School
Padmavathy Jawahar

DOI:

https://doi.org/10.47611/jsrhs.v13i2.6819

Keywords:

artificial intelligence, Twitter, Sentiment analysis, natural language processing, machine learning, sklearn

PDF

Abstract

In an age of social media, online forums, and chats, cyberbullying is a prevalent issue. On Twitter (now X), approximately 500 million tweets are shared per day (Antonakaki et.al., 2021). It is the job of the moderators to ensure these tweets follow standard community guidelines. However, the sheer number of tweets makes it difficult to sort manually and ensure they are following protocol. Sentiment analysis and machine learning algorithms can be used to classify these texts automatically as positive or negative. Normally, these machine learning models are much more efficient and may provide higher accuracy rates in identifying hate speech in Twitter. In this paper, we are exploring the use of five classical machine learning algorithms to classify Twitter hate speech as neutral, racist, or sexist. Model performance was compared after using raw tweet data versus pre-processed tweets through data cleanup. Furthermore, we highlight two methods to deal with imbalanced datasets to improve the prediction rates. Overall, we were able to achieve a 96% accuracy in correctly classifying tweets into the different labels.

Downloads

Author Biography

Padmavathy Jawahar

AP Computer Science Principles teacher at Mountain House High School.

References or Bibliography

Antonakaki, D., Fragopoulou, P., & Ioannidis, S. (2021). A survey of Twitter research: Data model, graph structure, sentiment analysis and attacks. Expert Systems with Applications, 164, 114006. https://doi.org/10.1016/j.eswa.2020.114006

Giachanou, A., & Crestani, F. (2016). Like It or Not. ACM Computing Surveys, 49(2), 1–41. https://doi.org/10.1145/2938640

1. Linear Models — scikit-learn 0.22.2 documentation. (n.d.). Scikit-Learn.org. https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression

6. Nearest Neighbors — scikit-learn 0.21.3 documentation. (2019). Scikit-Learn.org. https://scikit-learn.org/stable/modules/neighbors.html

Scikit-learn. (2019). 1.9. Naive Bayes — scikit-learn 0.21.3 documentation. Scikit-Learn.org. https://scikit-learn.org/stable/modules/naive_bayes.html

11. Ensemble methods. (n.d.). Scikit-Learn. https://scikit-learn.org/stable/modules/ensemble.html#random-forests

Google Developers. (2019, March 5). Classification: Precision and Recall - Google Developers.

https://developers.google.com/machine-learning/crash-course/classification/precision-and-recall

Nothman, J., Qin, H., & Yurchak, R. (2018). Stop Word Lists in Free Open-source Software Packages. Proceedings of Workshop for NLP Open Source Software (NLP-OSS). https://doi.org/10.18653/v1/w18-2502

Willett, P. (2006). The Porter stemming algorithm: then and now. Program, 40(3), 219–223. https://doi.org/10.1108/00330330610681295

Khyani, Divya & B S, Siddhartha. (2021). An Interpretation of Lemmatization and Stemming in Natural Language Processing. Shanghai Ligong Daxue Xuebao. Journal of University of Shanghai for Science and Technology. 22. 350-357.

sklearn.feature_extraction.text.TfidfTransformer — scikit-learn 0.23.1 documentation. (n.d.). Scikit-Learn.org. https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfTransformer.html

Vargas, V., Aranda, J., Costa, R., Pereira, P., & Luis, J. (2022). Imbalanced data preprocessing techniques for machine learning: a systematic mapping study. 65(1), 31–57. https://doi.org/10.1007/s10115-022-01772-8

Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Müller, A., Grisel, O., Niculae, V., Prettenhofer, P., Gramfort, A., Grobler, J., Layton, R., Vanderplas, J., Joly, A., Holt, B., & Varoquaux, G. (2013). API design for machine learning software: experiences from the scikit-learn project.

Twitter Sentiment Analysis Using Machine Learning

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biography

Padmavathy Jawahar

References or Bibliography

Published

How to Cite

Issue

Section

Announcements

Call for Papers: Volume 14 Issue 3

ARTICLES
PUBLISHED

STUDENT
AUTHORS

YEARS
OF SERVICE

Twitter Sentiment Analysis Using Machine Learning

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biography

Padmavathy Jawahar

References or Bibliography

Published

How to Cite

Issue

Section

Announcements

Call for Papers: Volume 14 Issue 3

ARTICLESPUBLISHED

STUDENTAUTHORS

YEARSOF SERVICE

ARTICLES
PUBLISHED

STUDENT
AUTHORS

YEARS
OF SERVICE