An Intelligent System for Early Prediction of Cardiovascular Disease using Machine Learning
DOI:
https://doi.org/10.47611/jsrhs.v11i3.2989Keywords:
Cardiovascular Disease, Machine Learrning, Classification ModelsAbstract
Cardiovascular disease (CVD) remains the leading cause of death, responsible for 18.6 million deaths globally in 2019. Given the wide availability of several effective therapeutic treatment options, early diagnosis of CVD is critical for timely intervention and slowing down the progression of the disease. CVD is associated with a multitude of risk markers with non-linear interactions among them, making accurate diagnosis of CVD quite challenging, especially for non-specialized clinicians and under-resourced facilities in developing countries. In recent years, machine learning based computational techniques have shown great promise in becoming a great diagnostic tool. The goal of this research is to leverage multiple machine learning methods such as random forest, gradient boosting, logistic regression and artificial neural network and evaluate their prediction efficacy. This study also evaluates the feasibility of combining multiple UCI datasets in order to improve the prediction accuracy of the models. On a merged dataset of over 700 patients from the UCI machine learning repository, the most accurate model was found to be the random forest classifier, showing an accuracy and F1 score of 94% and AUC of 0.98. It was found that ensemble learning methodologies along with data optimization and hyperparameter tuning techniques were able to achieve higher accuracy relative to prior published studies on these datasets. Finally, this study also proposes how these machine learning workloads can be incorporated into a distributed cloud connected healthcare system to make them widely accessible to practicing doctors and enable them to assess CVD risk of their patients.
Downloads
References or Bibliography
2021 Heart Disease and Stroke statistics update fact sheet at-a-glance. (n.d.). Retrieved June 1, 2022, from https://www.heart.org/-/media/phd-files-2/science-news/2/2021-heart-and-stroke-stat-update/2021_heart_disease_and_stroke_statistics_update_fact_sheet_at_a_glance.pdf?la=en
Machine learning: What it is and why it matters. SAS. (n.d.). Retrieved May 31, 2022, from https://www.sas.com/en_us/insights/analytics/machine-learning.html
Nasteski, V. (2017). An overview of the supervised machine learning methods. HORIZONS.B, 4, 51-62. https://doi.org/10.20544/horizons.b.04.1.17.p05
Diabetes prediction using support Vector Machines. Sisense. (2022, March 18). Retrieved May 31, 2022, from https://www.sisense.com/blog/diabetes-prediction-using-support-vector-machines/
What is logistic regression? Master's in Data Science. (n.d.). Retrieved May 31, 2022, from https://www.mastersindatascience.org/learning/introduction-to-machine-learning-algorithms/logistic-regression/
Yıldırım, S. (2020, February 17). Gradient boosted decision trees-explained. Medium. Retrieved May 31, 2022, from https://towardsdatascience.com/gradient-boosted-decision-trees-explained-9259bd8205af
Brownlee, J. (2020, December 2). Bagging and Random Forest Ensemble algorithms for Machine Learning. Machine Learning Mastery. Retrieved May 31, 2022, from https://machinelearningmastery.com/bagging-and-random-forest-ensemble-algorithms-for-machine-learning/
Bhoyar, S., Wagholikar, N., Bakshi, K., & Chaudhari, S. (2021). Real-time heart disease prediction system using Multilayer Perceptron. 2021 2nd International Conference for Emerging Technology (INCET). https://doi.org/10.1109/incet51464.2021.9456389
Whisker plot. Whisker Plot - an overview | ScienceDirect Topics. (n.d.). Retrieved May 31, 2022, from https://www.sciencedirect.com/topics/mathematics/whisker-plot
Pal, M., & Parija, S. (2021). Prediction of heart diseases using Random Forest. Journal of Physics: Conference Series, 1817(1), 012009. https://doi.org/10.1088/1742-6596/1817/1/012009
UCI Machine Learning Repository: Heart disease data set. (n.d.). Retrieved May 31, 2022, from https://archive.ics.uci.edu/ml/datasets/heart+disease
Singh, A., & Kumar, R. (2020). Heart disease prediction using machine learning algorithms. 2020 International Conference on Electrical and Electronics Engineering (ICE3). https://doi.org/10.1109/ice348803.2020.9122958
Mishra, A. (2020, May 28). Metrics to evaluate your machine learning algorithm. Medium. Retrieved May 31, 2022, from https://towardsdatascience.com/metrics-to-evaluate-your-machine-learning-algorithm-f10ba6e38234
UCI Machine Learning Repository: Statlog (heart) data set. (n.d.). Retrieved May 31, 2022, from https://archive.ics.uci.edu/ml/datasets/statlog+(heart)
Published
How to Cite
Issue
Section
Copyright (c) 2022 Aarush Kachhawa; Jeremy Hitt
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright holder(s) granted JSR a perpetual, non-exclusive license to distriute & display this article.