Prediction of Chronic Graft vs. Host Disease Using Machine Learning
DOI:
https://doi.org/10.47611/jsrhs.v11i3.2910Keywords:
Machine Learning, GVHD, chronicAbstract
This paper attempts to predict the onset of chronic Graft vs. Host Disease (GVHD) in children with blood cancers who have received a bone marrow or stem cell transplant using machine learning models. It analyzes and compares the results of three different models in terms of how accurate they each are in predicting chronic GVHD. These models are Logistic Regression, J48 algorithm using decision trees, and Multilayer Perceptron. The models are formed using a dataset containing 36 attributes, excluding chronic GVHD itself. Through data preprocessing and analysis in Weka, these 36 attributes are narrowed down for each model to figure out which combination of attributes leads to the best predictive accuracy. The study uses 10-fold cross validation for each model and uses the Receiver Operating Characteristic (ROC) Area as a measure of the accuracy for each model. The study found that Multilayer Perceptron is the best predictor of chronic GVHD. In comparison, Logistic Regression was the worst predictor of chronic GVHD. The J48 algorithm used the least number of attributes to make its prediction.
Downloads
References or Bibliography
Tompa, Rachel. “Life with graft-vs.-host disease: When the transplant is just the beginning.” Fred Hutch, 21 April 2015, Life with graft-vs.-host disease: When the transplant is just the beginning (fredhutch.org)
Bone marrow transplant: children. (2020). UCI Machine Learning Repository, UCI Machine Learning Repository
Eibe Frank, Mark A. Hall, and Ian H. Witten (2016). The WEKA Workbench. Online Appendix for "Data Mining: Practical Machine Learning Tools and Techniques", Morgan Kaufmann, Fourth Edition, 2016.
Brownlee, Jason. “How to Calculate Precision, Recall, and F-Measure for Imbalanced Classification
.” Machine Learning Mastery, 3 January 2020, How to Calculate Precision, Recall, and F-Measure for Imbalanced Classification (machinelearningmastery.com)
Ekelund, Suzanne. “ROC curves – what are they and how are they used?” acutecaretesting, January 2011, ROC curves – what are they and how are they used? (acutecaretesting.org)
Brownlee, Jason. “A Gentle Introduction to k-fold Cross-Validation.” Machine Learning Mastery, 3 August 2020, A Gentle Introduction to k-fold Cross-Validation (machinelearningmastery.com)
“What is Logistic Regression?” Statistics Solutions, What is Logistic Regression? - Statistics Solutions
Ketha, Santhosh. “Effect of outliers on Neural Network’s performance.” Medium, 29 October 2019, Effect of outliers on Neural Network’s performance | by Santhosh Ketha | Analytics Vidhya | Medium
“J48 Classifier Parameters.” The Schank Academy, j48_parameters.pdf (schankacademy.com)
”Multilayer Perceptron.” Science Direct, Multilayer Perceptron - an overview | ScienceDirect Topics
“The Backpropagation Algorithm-PART(1): MLP and Sigmoid.” ML-DAWN, The Backpropagation Algorithm-PART(1): MLP and Sigmoid | ML-DAWN (mldawn.com)
Chakraborty, Arunava. “Derivative of the Sigmoid function.” Towards Data Science, 7 July 2018, Derivative of the Sigmoid function | by Arc | Towards Data Science
McMullin, Lin. “Differentiability Implies Continuity.” Teaching Calculus, 17 September 2019, Differentiability Implies Continuity | Teaching Calculus
Zach. “What is Considered a Good AUC Score?” Statology, 9 September 2021, What is Considered a Good AUC Score? - Statology
Raschka, Sebastian. “What is the relation between Logistic Regression and Neural Networks and when to use which?”, What is the relation between Logistic Regression and Neural Networks and when to use which? (sebastianraschka.com)
“Decision Tree Advantages and Disadvantages.” eduCBA, Decision Tree Advantages and Disadvantages | Decision Tree Regressor (educba.com)
Pidala, J., Sarwal, M., Roedder, S. et al. Biologic markers of chronic GVHD. Bone Marrow Transplant 49, 324–331 (2014). https://doi.org/10.1038/bmt.2013.97
Published
How to Cite
Issue
Section
Copyright (c) 2022 Sanay Bordia; Professor Ramezani
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright holder(s) granted JSR a perpetual, non-exclusive license to distriute & display this article.