A Machine Learning Based Approach for Prediction of Breast Cancer Patient Prognosis through Clinical Analysis

Authors

  • Adithya Nair Branham High School
  • Sharifa Sahai Harvard University

DOI:

https://doi.org/10.47611/jsrhs.v11i3.3815

Keywords:

Patient Prognosis, Clinical Attributes, Supervised Learning, Hyperparameter Tuning

Abstract

Patient prognosis for cancer patients is a crucial aspect in the healthcare industry with researchers providing novel insights to doctors evaluating treatment options that have significant implications on patient lifestyle choices. By analyzing correlations of the genetic and clinical attributes for breast cancer patients, previous studies have utilized machine learning algorithms to predict the probability of patient survival based on a five-year timeframe. However, our project focuses on predicting a more specific label (overall survival months), utilizing the extensive dataset of an international breast cancer study, considering only the clinical attributes in scope. The application of the Multivariate Regression and Random Forest models was used to assess the relative importance of each clinical variable. The project results present the Random Forest model to be a better fit, accounting for 44% of the variance in the testing dataset. Further analysis with the expansion of other datasets would help improve the model accuracy.

 

Downloads

Download data is not yet available.

Author Biography

Sharifa Sahai, Harvard University

Advisor

References or Bibliography

Alto, V. (2019, August 17). Understanding the OLS method for simple linear regression. Medium. Retrieved June 29, 2022, from https://towardsdatascience.com/understanding-the-ols-method-for-simple-linear-regression-e0a4e8f692cc

Benz, C. C. (2008, April). Impact of aging on the biology of breast cancer. Critical reviews inoncology/hematology. Retrieved July 27, 2022, from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2626623/

BRCA1- and BRCA2 -associated Hereditary Breast and ... - NCBI bookshelf. (n.d.). Retrieved August 25, 2022, from https://www.ncbi.nlm.nih.gov/books/NBK1247/

Breast cancer - statistics. Cancer.Net. (2022, May 24). Retrieved June 29, 2022, from https://www.cancer.net/cancer-types/breast-cancer/statistics

Breast cancer gene expression profiles (METABRIC). Kaggle. (n.d.). Retrieved June 8, 2022, from https://www.kaggle.com/datasets/raghadalharbi/breast-cancer-gene-expression-profiles-metabric

Breast cancer statistics and resources: Breast Cancer Research Foundation: BCRF. Breast Cancer Research Foundation. (2021, August 31). Retrieved July 21, 2022, from https://www.bcrf.org/breast-cancer-statistics-and-resources/

Demo. (2022, August 9). Working together, data scientists and cancer researchers can transform cancer treatment. Susan G. Komen®. Retrieved July 24, 2022, from https://blog.komen.org/blog/data-scientists-and-cancer-researchers/

HealthITAnalytics. (2022, January 19). Machine learning supports breast cancer diagnosis predictions. HealthITAnalytics. Retrieved July 2, 2022, from https://healthitanalytics.com/news/machine-learning-supports-breast-cancer-diagnosis-predictions

Hormone therapy for breast cancer fact sheet. National Cancer Institute. (n.d.). Retrieved July 23, 2022, from https://www.cancer.gov/types/breast/breast-hormone-therapy-fact-sheet

Humphries, K. H., & Gill, S. (2003, April 15). Risks and benefits of hormone replacement therapy: The evidence speaks. CMAJ : Canadian Medical Association journal = journal de l'Association medicale canadienne. Retrieved June 19, 2022, from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC152685/

Jain, R. (2022, April 20). Application of multivariate regression analysis. Knowledge Tank. Retrieved June 29, 2022, from https://www.projectguru.in/application-of-multivariate-regression-analysis/

Koehrsen, W. (2018, January 10). Hyperparameter tuning the random forest in python. Medium. Retrieved June 18, 2022, from https://towardsdatascience.com/hyperparameter-tuning-the-random-forest-in-python-using-scikit-learn-28d2aa77dd74

Kurama, V. (2021, April 9). A complete guide to decision trees. Paperspace Blog. Retrieved August 27, 2022, from https://blog.paperspace.com/decision-trees/

Li J;Zhou Z;Dong J;Fu Y;Li Y;Luan Z;Peng X; (n.d.). Predicting breast cancer 5-year survival using Machine Learning: A Systematic Review. PloS one. Retrieved August 23, 2022, from https://pubmed.ncbi.nlm.nih.gov/33861809/

Madell, R. (2021, June 23). Metastatic breast cancer prognosis. Healthline. Retrieved July 19, 2022, from https://www.healthline.com/health/breast-cancer/metastatic-prognosis

Mucaki, E. J., Baranova, K., Pham, H. Q., Rezaeian, I., Angelov, D., Ngom, A., Rueda, L., & Rogan, P. K. (2016, August 31). Predicting outcomes of hormone and chemotherapy in the molecular taxonomy of breast cancer international consortium (METABRIC) study by biochemically-inspired machine learning. F1000Research. Retrieved August 4, 2022, from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5461908/

R;, H. N. C. K. A. K. A. W. A. (n.d.). Differences in breast cancer survival by molecular subtypes in the United States. Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology. Retrieved August 8, 2022, from https://pubmed.ncbi.nlm.nih.gov/29593010/

UCSF Health. (2022, June 24). Breast cancer risk factors. ucsfhealth.org. Retrieved July 16, 2022, from https://www.ucsfhealth.org/education/breast-cancer-risk-factors

Published

08-31-2022

How to Cite

Nair, A., & Sahai, S. (2022). A Machine Learning Based Approach for Prediction of Breast Cancer Patient Prognosis through Clinical Analysis. Journal of Student Research, 11(3). https://doi.org/10.47611/jsrhs.v11i3.3815

Issue

Section

HS Research Projects