Using Machine Learning Techniques to Predict United States House of Representatives Elections
DOI:
https://doi.org/10.47611/jsrhs.v12i3.4697Keywords:
Election prediction, House of Representatives, Artificial Intelligence, Two-tier ML model, LSTM, Ridge RegressionAbstract
We predict the results of the United States House of Representatives elections using machine learning techniques. We started by collecting and preprocessing data on the partisan lean of districts, the state of the economy, the national political environment, candidate political stances, news headlines about each candidate in each race, and the past election results. Then, we selected, designed, and trained the models we would use to predict those election results. We used single-tier models that took either only news headline text data or only numerical data as inputs and two-tier models that used both news headlines and numerical data. Our best-performing model was a two-tier model with a GRU as the first tier followed by a Ridge Regressor as the second tier, with a root mean squared error of under 2 percentage points. The vote share predicted by our best model was within 2 percentage points of the actual observed vote share.
Downloads
References or Bibliography
Wikimedia Foundation. (2023, May 19). 2018 United States House of Representatives elections in California. Wikipedia. https://en.wikipedia.org/wiki/2018_United_States_House_of_Representatives_elections_in_California
Wikimedia Foundation. (2023a, April 8). 2018 United States House of Representatives elections in Virginia. Wikipedia. https://en.wikipedia.org/wiki/2018_United_States_House_of_Representatives_elections_in_Virginia
Borisov, V., Leemann, T., Sessler, K., Haug, J., Pawelczyk, M., & Kasneci, G. (2022). Deep neural networks and tabular data: A survey. IEEE Transactions on Neural Networks and Learning Systems, 1–21. https://doi.org/10.1109/tnnls.2022.3229161
Brownlee, J. (2019, August 12). Overfitting and underfitting with machine learning algorithms. MachineLearningMastery.com. https://machinelearningmastery.com/overfitting-and-underfitting-with-machine-learning-algorithms/
Brownlee, J. (2022, August 15). When to use MLP, CNN, and RNN Neural Networks. MachineLearningMastery.com. https://machinelearningmastery.com/when-to-use-mlp-cnn-and-rnn-neural-networks/
Bradlee, D., Rinn, D., Ramsay, A., & Crowley, T. (2021, November 9). CA 2020 Congressional. Dave’s Redistricting. https://davesredistricting.org/maps#stats::d6ccadfa-243e-4ecc-9bb2-716b3f82afee
Reader, T. C. (2021, October 4). Decision tree regression. The Click Reader. https://www.theclickreader.com/decision-tree-regression/
Castillo, D. (2023, April 7). Decision trees in machine learning explained. Seldon. https://www.seldon.io/decision-trees-in-machine-learning
Grinsztajn, L., Oyallon, E., & Varoquaux, G. (2022, July 18). Why do tree-based models still outperform deep learning on tabular data?. arXiv.org. https://doi.org/10.48550/arXiv.2207.08815
St. Louis, F. R. B. of. (2023, April 27). Gross domestic product. FRED. https://fred.stlouisfed.org/series/GDP
Isotalo, V., Saari, P., Paasivaara, M., Steineker, A., & Gloor, P. A. (2016). Predicting 2016 US presidential election polls with online and media variables. Designing Networks for Innovation and Improvisation, 45–53. https://doi.org/10.1007/978-3-319-42697-6_5
Joby, A. (2021, July 19). What is K-nearest neighbor? an ML algorithm to classify data. Learn Hub. https://learn.g2.com/k-nearest-neighbor
Jose, R., & Chooralil, V. S. (2016). Prediction of election result by enhanced sentiment analysis on Twitter data using classifier ensemble approach. 2016 International Conference on Data Mining and Advanced Computing (SAPIENCE), 64–67. https://doi.org/10.1109/sapience.2016.7684133
Joseph, F. J. (2019). Twitter based outcome predictions of 2019 Indian general elections using decision tree. 2019 4th International Conference on Information Technology (InCIT), 50–53. https://doi.org/10.1109/incit.2019.8911975
Jones-Rooy, A., Mehta, D., Radcliffe, M., Rakich, N., Shan, D., & Wolfe, J. (2023, May 11). Generic Ballot : 2018 polls. FiveThirtyEight. https://projects.fivethirtyeight.com/polls/generic-ballot/2018/
Pedamallu, H. (2020, November 30). Rnn Vs Gru VS LSTM. Medium. https://medium.com/analytics-vidhya/ rnn-vs-gru-vs-lstm-863b0b7b1573
Raj, A. (2021, June 11). A quick and dirty guide to random forest regression. Medium. https://towardsdatascience.com/a-quick-and-dirty-guide-to-random-forest-regression-52ca0af157f8
Ramsay, A. (2020, February 7). Election Composites. Medium. https://medium.com/dra-2020/election-composites-13d05ed07864
Engati. (2023, May 18). Ridge regression. Engati. https://www.engati.com/glossary/ridge-regression
Carnegie Mellon University Statistics & Data Science. (2021, July 8). Supervised Learning. Carnegie Mellon Sports Analytics. https://www.stat.cmu.edu/cmsac/sure/2021/materials/lectures/slides/18-KNN-kernel.html#1
TensorFlow. (2023, March 23). Tf.keras.preprocessing.text.tokenizer : tensorflow V2.12.0. TensorFlow. https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/text/Tokenizer
Tsai, M.-H., Wang, Y., Kwak, M., & Rigole, N. (2019). A machine learning based strategy for election result prediction. 2019 International Conference on Computational Science and Computational Intelligence (CSCI), 1408–1410. https://doi.org/10.1109/csci49370.2019.00263
US PCE inflation rate (I:USPCEQIR). YCharts. (2023, April 27). https://ycharts.com/indicators/us_pce_quarterly_inflation_rate
US unemployment rate (I:USURSQ). YCharts. (2023b, April 7). https://ycharts.com/indicators/us_unemployment_rate_quarterly
Bradlee, D., Rinn, D., Ramsay, A., & Crowley, T. (2022, June 12). VA 2020 Congressional. Dave’s Redistricting. https://davesredistricting.org/maps#stats::28033d62-2027-4661-95d0-557621f823e9
Zach. (2021, August 26). When to use Ridge & Lasso regression. Statology. https://www.statology.org/when-to-use-ridge-lasso-regression/
Zolghadr, M., Niaki, S. A., & Niaki, S. T. (2017). Modeling and forecasting US presidential election using learning algorithms. Journal of Industrial Engineering International, 14(3), 491–500. https://doi.org/10.1007/s40092-017-0238-2
Gunjal, S. (2021). What is root mean square error (RMSE): Data Science and Machine Learning. Kaggle. https://www.kaggle.com/general/215997
pandas. (2023, April 24). Pandas.dataframe. pandas 2.0.1 documentation. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html
scikit-learn. scikit-learn: machine learning in Python — scikit-learn 1.2.2 documentation. (2023, March 9). https://scikit-learn.org/stable/
Aggarwal, R., & Ranganathan, P. (2016). Common pitfalls in statistical analysis: The use of correlation techniques. Perspectives in clinical research, 7(4), 187–190. https://doi.org/10.4103/2229-3485.192046
Published
How to Cite
Issue
Section
Copyright (c) 2023 Sanatan Mishra; John Lee
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright holder(s) granted JSR a perpetual, non-exclusive license to distriute & display this article.