Identification of a Panel of Biomarkers for the Early Detection of Ovarian Cancer
DOI:
https://doi.org/10.47611/jsrhs.v11i2.2628Keywords:
Ovarian Cancer, RandomForest, mRMR, Heroku, ML ClassifiersAbstract
According to the CDC, in the United States, Ovarian Cancer is the second most prevalent form of gynecologic cancer and is the fifth leading cause of mortality in women. The only reliable method to screen for this cancer is TVS (trans-vaginal sonography), which is both invasive and costly. The goal of this project was to use the mRMR (Maximum Relevance Minimum Redundancy) Feature Selection Algorithm to select a panel of biomarkers from the Ovarian Cancer dataset and create a non-invasive and inexpensive software tool that could help validate the panel and assist with the early detection of Ovarian Cancer, with a reasonable level of sensitivity.
This project uses an ovarian cancer dataset with 49 features. The mRMR filter method [9, 10, 12]of feature selection eliminates the redundant features while keeping the relevant features that impact the target class. This project accomplished the final goal of creating a working web application that asks a clinician to provide a few basic blood test results and generates a prediction. The machine learning model [7] used by the application is Random Forest Machine Learning model which is created with the K best features picked by the mRMR algorithm and is successfully utilized to predict the disease and treatment targets thus helping with reducing the mortality rate from ovarian cancer.
This project used the Random Forest Classifier model machine learning model. It has been shown to work well with smaller datasets (as with this project’s dataset) and had a sensitivity score of 0.96.
Downloads
References or Bibliography
“1.17. Neural Network Models (supervised)." Scikit-learn,
scikit-learn.org/stable/modules/neural_networks_supervised.html. Accessed 17 Jan. 2022.
“12 Types of Neural Networks Activation Functions: How to Choose?" V7 - AI Data Platform
for ML Teams, 17 Jan. 2022, www.v7labs.com/blog/neural-networks-activation-functions.
“Advantages of Tree-Based Modeling." Summit | Quantitative Consulting and Data Analytics,
www.summitllc.us/blog/advantages-of-tree-based-modeling.
“Understanding the AUC-ROC Curve in Machine Learning Classification."
Analytics India Magazine, 7 Oct. 2021,
analyticsindiamag.com/understanding-the-auc-roc-curve-in-machine-learning-classification/#:~:text=ROC%20curve%2C%2
also%20known%20as,sensitivity%20of%20the%20classifier%20model.
“Complete Guide on Model Deployment with Flask and Heroku." Medium, 1
Jan. 2022, towardsdatascience.com/complete-guide-on-model-deployment-with-flask-and-heroku-98c87554a6b9.
“How to Use StandardScaler and MinMaxScaler Transforms in Python." Machine Learning
Mastery, 27 Aug. 2020, machinelearningmastery.com/standardscaler-and-minmaxscaler-transforms-in-python/.
“Machine Learning: What It is and Why It Matters."
www.sas.com/en_us/insights/analytics/machine-learning.html.
Malik, Farhad. "What Are Hidden Layers?" Medium, 20 May 2019,
medium.com/fintechexplained/what-are-hidden-layers-4f54f7328263.
Catà Villà, M. (2014, June). FEATURE SELECTION METHODS FOR PREDICTING PRECLINICAL STAGE IN ALZHEIMER’S DISEASE. https://imatge.upc.edu/web/sites/default/files/pub/xCata16.pdf
Mazzanti, S. (2022, February 15). “MRMR” explained exactly how you wished someone explained to you. Medium. https://towardsdatascience.com/mrmr-explained-exactly-how-you-wished-someone-explained-to-you-9cf4ed27458b
Song, H., Yang, E., Kim, J., Park, C., Kyung, M., & Kim, Y. (2018). Best serum biomarker combination for ovarian cancer classification. BioMedical Engineering OnLine, 17(S2). https://doi.org/10.1186/s12938-018-0581-6
“Maximum Relevance and Minimum Redundancy Feature Selection Methods for a Marketing
Machine Learning Platform." ArXiv.org E-Print Archive, arxiv.org/pdf/1908.05376.pdf.
Mendeley Data, data.mendeley.com/.
Narkhede, Sarang. "Understanding AUC - ROC Curve." Medium, 15 June 2021,
towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5.
“NN - Multi-layer Perceptron Classifier (MLPClassifier)." Michael Fuchs Python, 3 Feb. 2021,
michael-fuchs-python.netlify.app/2021/02/03/nn-multi-layer-perceptron-classifier-mlpclassifier/.
“Ovarian Cancer - Symptoms and Causes." Mayo Clinic, 25 July 2019,
www.mayoclinic.org/diseases-conditions/ovarian-cancer/symptoms-causes/syc-20375941.
“Types and Stages." Ovarian.org, 18 June 2021,
ovarian.org/about-ovarian-cancer/types-and-stages/?utm_term=&utm_campaign=Dynamic+Search+Ads+2021&utm_source=adwords&utm_medium=ppc&hsa_acc=9835623983&hsa_cam=15289195583&hsa_grp=130033718819&hsa_ad=562233134588&hsa_src=g&hsa_tgt=dsa-437115340933&hsa_kw=&hsa_mt=&hsa_net=adwords&hsa_ver=3&gclid=CjwKCAiAn5uOBhADEiwA_pZwcFUeMcpbvRsb6T4OfXm5Nl7kkO8ESDtUJZPEc4RS8YdzF-hlvvU_oxoCLE8QAvD_BwE.
Published
How to Cite
Issue
Section
Copyright (c) 2022 Riya Davar; Madhuri Yalamanchili, MD
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright holder(s) granted JSR a perpetual, non-exclusive license to distriute & display this article.