Impact of Gender Bias in Training Data for Machine Learning Models predicting Myocardial Infarction
DOI:
https://doi.org/10.47611/jsrhs.v12i3.4532Keywords:
Machine Learning, Myocardial Infarction, Model Bias, Gender Bias, EvaluationAbstract
The use of biomarkers reference ranges derived from clinical trials to detect and diagnose the presence of cardiovascular disease (CVD) and the occurrence of myocardial infarction (MI) is well established. However, the predominance of older male participants in these trials has been shown to contribute to increased rate of misdiagnoses among women. The application of Machine Learning (ML) to medical diagnosis promises the potential to improve accuracy. However, ML also has the potential to perpetuate this problem of gender bias in trial data leading to worse outcomes for female patients. This research found that models trained using data containing only male patient data were less accurate at predicting MI among female patients than those trained using data sets with greater representation of females. For the model trained only with male patient data, this increased false negative rate corresponds to 2% of female patients with MI not being correctly diagnosed. Addressing this issue will require (a) the collection of more female patient data to support the construction of training data sets which accurately reflect the patient population (b) consistent reporting of gender mix and other demographic information such as age when ML model performance is reported.
Downloads
References or Bibliography
An Extensive Dataset for the Heart Disease Classification System - Sozan S. Maghdid, Tarik A. Rashid, Published: 17 February 2022 https://data.mendeley.com/datasets/65gxgy2nmg
Trends in Use of Biomarker Protocols for the Evaluation of Possible Myocardial Infarction - Brian J. Hachey, Michael C. Kontos, L. Kristin Newby, Robert H. Christenson, W. Frank Peacock, Katherine C. Brewer and James McCord Originally published 22 Sep 2017 https://doi.org/10.1161/JAHA.117.005852 Journal of the American Heart Association. 2017;6:e005852
“Benchmarks for the assessment of novel cardiovascular biomarkers.”, Morrow, D. A., and J. A. de Lemos. Circulation, vol. 115, no. 8, 2007, pp. 949-952
“Introduction to Learning Classifier Systems”, Ryan J. Urbanowicz , Will N. Browne, Springer, 2017
“Can machine-learning models overcome biased datasets?”, Adam Zewe, February 2022, https://news.mit.edu/2022/machine-learning-biased-data-0221
“A Comparative Study of Myocardial Infarction Detection from ECG Data Using Machine Learning”, Aritra Chakraborty, Santanu Chatterjee, Koushik Majumder, Rabindra Nath Shaw & Ankush Ghosh, Lecture Notes in Networks and Systems book series (LNNS,volume 218), 2021
“Risk prediction model for in-hospital mortality in women with ST-elevation myocardial infarction: A machine learning approach”, Hend Mansoor, PharmD, Islam Y. Elgendy, Richard Segal, Anthony A. Bavry, Jiang Bian, Care of patients with cardiovascular disorders, vol 46, issue 6, p405-411, 2017
“Scikit-learn: Machine Learning in Python” https://scikit-learn.org/stable/
Published
How to Cite
Issue
Section
Copyright (c) 2023 Victoria Harding Bradley
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright holder(s) granted JSR a perpetual, non-exclusive license to distriute & display this article.