Evaluating Baseball Statistics by Predicting Playoff Teams
DOI:
https://doi.org/10.47611/jsrhs.v12i4.5763Keywords:
baseball analytics, xgboost, logistic regression, sabermetrics, moneyballAbstract
In this paper, we explore how different baseball statistics correlate to an entry to the playoffs. We use LogisticRegression and XGBoost to evaluate if a baseball statistic has a high correlation with whether or not a team makes the playoffs. We set up three models: the Moneyball model (which uses moneyball statistics), the All Stats Model (which uses moneyball statistics and additional common statistics), and the XGBoost model (which uses the same dataset of the All Stats model, but the structure of the model is different). We compare these models, evaluating their accuracy, variable coefficients, and confusion matrix. From these tests, we find that the Moneyball Model has similar accuracies to the All Stats Model, revealing that moneyball statistics are still a relevant and accurate way to predict if a team makes the playoffs. The variable coefficients test highlights that moneyball statistics have the highest importance in the model's ability to predict if a team makes the playoffs. While the tests provide a foundation for the evaluation of moneyball and common baseball statistics, there remains future opportunities to use different models and a larger dataset.
Downloads
References or Bibliography
Taylor, B. (2014, June 8). Theo Epstein wants playoffs or suck, nothing in between, and other bullets. Bleacher Nation | Chicago Sports News, Rumors, and Obsession. https://www.bleachernation.com/cubs/2013/02/27/theo-epstein-wants-playoffs-or-suck-nothing-in-between-and-other-bullets/
Wade, C. (2020). Getting Started with XGBoost in scikit-learn. Medium. https://towardsdatascience.com/getting-started-with-xgboost-in-scikit-learn- f69f5f470a97
Cabral, C. (2020, June 23). Bill James: How sabermetrics changed baseball. Shortform Books. https://www.shortform.com/blog/bill-james-moneyball/
Apostolou, K., & Tjortjis, C. (2019). Sports analytics algorithms for performance prediction. 2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA). https://doi.org/10.1109/iisa.2019.8900754
Mizels, J., Erickson, B., & Chalmers, P. (2022). Current state of data and analytics research in baseball. Current Reviews in Musculoskeletal Medicine, 15(4), 283–290. https://doi.org/10.1007/s12178-022-09763-6
Appelman, D. (2008, March 15). Get to know: Runs created. FanGraphs Baseball. https://blogs.fangraphs.com/get-to-know-runs-created/
Chicago Cubs playoff history: 1885 - 2023. Playoff History | 1885 - 2023. (n.d.). https://champsorchumps.us/team/mlb/chicago-cubs
Longest MLB postseason droughts, active and all-time. , Active and All-Time. (n.d.). https://champsorchumps.us/drought/longest-mlb-playoff-drought#tab-drought-historic
Published
How to Cite
Issue
Section
Copyright (c) 2023 Rohan Nakra; Ryan Kimes
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright holder(s) granted JSR a perpetual, non-exclusive license to distriute & display this article.