Predicting Running Injuries with Classification Machine Learning Models

Authors

DOI:

https://doi.org/10.47611/jsrhs.v12i1.4046

Keywords:

F-beta score, Random Forest Classifier, Logistic Regression, classification, running injury, Hyperparameter Tuning, imbalanced dataset

Abstract

Can running injuries be predicted using only a dataset and machine learning models? This paper explores this question using classification models, including the Logistic Regression model and the Random Forest Classifier model. In the dataset used, ten features were taken into account when predicting running injuries. With slight modifications, the Weighted Logistic Regression and over and down-sampling Random Forest Classifier models were used to mitigate the imbalance in the dataset. The results suggested that the best model was Weighted Logistic Regression and that the best score metric to consider was the F-beta score. 

Downloads

Download data is not yet available.

References or Bibliography

Lovdal, S., den Hartigh, R., & Azzopardi, G. (2021). Injury Prediction in Competitive Runners with Machine Learning. International Journal of Sports Physiology and Performance, 16(10), 1522–1531. https://doi.org/10.1123/ijspp.2020-0518

Chmait, N., & Westerbeek, H. (2021). Artificial Intelligence and Machine Learning in Sport Research: An Introduction for Non-data Scientists. Frontiers in Sports and Active Living, 3. https://www.frontiersin.org/articles/10.3389/fspor.2021.682287

Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874. https://doi.org/10.1016/j.patrec.2005.10.010

F-beta score. (n.d.). Hasty.Ai. Retrieved September 29, 2022, from https://hasty.ai/docs/mp-wiki/metrics/f-beta-score

Iyer, S. R., & Sharda, R. (2009). Prediction of athletes performance using neural networks: An application in cricket team selection. Expert Systems with Applications, 36(3, Part 1), 5510–5522. https://doi.org/10.1016/j.eswa.2008.06.088

Maalouf, M., & Siddiqi, M. (2014). Weighted logistic regression for large-scale imbalanced and rare events data. Knowledge-Based Systems, 59, 142–148. https://doi.org/10.1016/j.knosys.2014.01.012

Lovdal, S., den Hartigh, R., & Azzopardi, G. (2021). Replication Data for: Injury Prediction In Competitive Runners With Machine Learning. DataverseNL. https://doi.org/10.34894/UWU9PV

Published

02-28-2023

How to Cite

Vuong, E., & Vincent, J. (2023). Predicting Running Injuries with Classification Machine Learning Models. Journal of Student Research, 12(1). https://doi.org/10.47611/jsrhs.v12i1.4046

Issue

Section

HS Research Projects