Breast Cancer Cell Data Analysis and Visualization Using R
DOI:
https://doi.org/10.47611/jsrhs.v11i3.3524Keywords:
Data Analysis, R, Breast CancerAbstract
Breast cancer is the most frequently occurring cancer in women. If the cancer is diagnosed and treated at an early stage, the patient has a survival rate of 99% after 5 years, but it significantly drops to 29% when it reaches a distant stage. Thus, it is very important to detect ‘positive cancer cells’ in the early stage, so I analyzed the 569 breast cancer cell data provided by the University of Wisconsin using R-Studio. Through this program, I visualized the relationships between 10 different cell characteristics, and researched the kind of relationship between radius and positive cancer cells by utilizing graphs. Also, I created a predictive model that can detect positive cancer cells based on logistic regression and training original 569 data. Finally, I recommended the adequate age (35-39) for the breast cancer examination by analyzing the breast cancer statistics. Through this research, I derived effective breast cancer predicting model and deeply explored big data analysis method and data-mining on R-Studio.
Downloads
References or Bibliography
American Cancer Society. (2022, March 1). Survival Rates for Breast Cancer. American Cancer Society. https://www.cancer.org/cancer/breast-cancer/understanding-a-breast-cancer-diagnosis/breast-cancer-survival-rates.html
UCI Machine Learning Repository. (1995, November 1). UCI Machine Learning Repository: Breast Cancer Wisconsin (Diagnostic) Data Set [Dataset]. The University of Wisconsin Clinical Sciences Center and Computer Science Dept. https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(diagnostic)
Jonathan Gelford, Martin Goros, Brian Hernandez, and Alex Bokoro. (July 2018). A System for an Accountable Data Analysis Process in R. Retrieved August,10,2022 from https://journal.r-project.org/archive/2018/RJ-2018-001/RJ-2018-001.pdf
UCLA Statistical Methods and Data Analytics. (n.d.). Logit Regression | R Data Analysis Examples. Retrieved August 22, 2022, from https://stats.oarc.ucla.edu/r/dae/logit-regression/
MeDiscovery. (2020, August 4). AUC-ROC 커브. BioinformaticsAndMe. https://bioinformaticsandme.tistory.com/328
KOSIS. (1999–2019). KOSIS-24 Cancer Patients Statistics [Dataset]. Korean Statistical Office. https://kosis.kr/statHtml/statHtml.do?orgId=117&tblId=DT_117N_A00023&conn_path=I3
Kwon, J. (2020). 따라하며 배우는 데이터 과학 (Learning Data Science by Following). 제이펍 주식회사.
Published
How to Cite
Issue
Section
Copyright (c) 2022 Kyoungeui Hong; Kangbin Yim, Keunhyuk Kim
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright holder(s) granted JSR a perpetual, non-exclusive license to distriute & display this article.