Breast Cancer Cell Data Analysis and Visualization Using R

Authors

  • Kyoungeui Hong Northern Valley Regional High School at Demarest
  • Kangbin Yim Soonchunhyang University
  • Keunhyuk Kim KakaoPay Corp.

DOI:

https://doi.org/10.47611/jsrhs.v11i3.3524

Keywords:

Data Analysis, R, Breast Cancer

Abstract

Breast cancer is the most frequently occurring cancer in women. If the cancer is diagnosed and treated at an early stage, the patient has a survival rate of 99% after 5 years, but it significantly drops to 29% when it reaches a distant stage. Thus, it is very important to detect ‘positive cancer cells’ in the early stage, so I analyzed the 569 breast cancer cell data provided by the University of Wisconsin using R-Studio. Through this program, I visualized the relationships between 10 different cell characteristics, and researched the kind of relationship between radius and positive cancer cells by utilizing graphs. Also, I created a predictive model that can detect positive cancer cells based on logistic regression and training original 569 data. Finally, I recommended the adequate age (35-39) for the breast cancer examination by analyzing the breast cancer statistics. Through this research, I derived effective breast cancer predicting model and deeply explored big data analysis method and data-mining on R-Studio.

Downloads

Download data is not yet available.

References or Bibliography

American Cancer Society. (2022, March 1). Survival Rates for Breast Cancer. American Cancer Society. https://www.cancer.org/cancer/breast-cancer/understanding-a-breast-cancer-diagnosis/breast-cancer-survival-rates.html

UCI Machine Learning Repository. (1995, November 1). UCI Machine Learning Repository: Breast Cancer Wisconsin (Diagnostic) Data Set [Dataset]. The University of Wisconsin Clinical Sciences Center and Computer Science Dept. https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(diagnostic)

Jonathan Gelford, Martin Goros, Brian Hernandez, and Alex Bokoro. (July 2018). A System for an Accountable Data Analysis Process in R. Retrieved August,10,2022 from https://journal.r-project.org/archive/2018/RJ-2018-001/RJ-2018-001.pdf

UCLA Statistical Methods and Data Analytics. (n.d.). Logit Regression | R Data Analysis Examples. Retrieved August 22, 2022, from https://stats.oarc.ucla.edu/r/dae/logit-regression/

MeDiscovery. (2020, August 4). AUC-ROC 커브. BioinformaticsAndMe. https://bioinformaticsandme.tistory.com/328

KOSIS. (1999–2019). KOSIS-24 Cancer Patients Statistics [Dataset]. Korean Statistical Office. https://kosis.kr/statHtml/statHtml.do?orgId=117&tblId=DT_117N_A00023&conn_path=I3

Kwon, J. (2020). 따라하며 배우는 데이터 과학 (Learning Data Science by Following). 제이펍 주식회사.

Published

08-31-2022

How to Cite

Hong, K., Yim, K. ., & Kim, K. . (2022). Breast Cancer Cell Data Analysis and Visualization Using R. Journal of Student Research, 11(3). https://doi.org/10.47611/jsrhs.v11i3.3524

Issue

Section

HS Research Projects