An Analysis of the k-Nearest Neighbor Classifier to Predict Benign and Malignant Breast Cancer Tumors
DOI:
https://doi.org/10.47611/jsrhs.v12i4.5577Keywords:
k-Nearest Neighbor, k-Folds Cross Validation, Breast Cancer, Machine Learning, AccuracyAbstract
Because of Breast Cancer's high mortality rate and being a leading cause of death among women worldwide, there has been importance given to machine learning (ML) algorithms to detect early signs of benign and malignant tumors effectively. Assistance from ML classifiers allows for a more efficient evaluation of mammographic results, surpassing the capabilities of radiologists who manually classify extensive patient data. This study aims to evaluate the effectiveness of the k-Nearest Neighbor (kNN) classifier in characterizing cancer tumor stages based on concavity, texture, area, perimeter, and smoothness. We employ scatterplots to differentiate between benign and malignant classes using the Breast Cancer Wisconsin Dataset (WBCD) from the University of California at Irvine Machine Learning Repository. Employing the k-Fold Cross Validation (k-FCV) technique, we determine the optimal value for k to assign anonymous data to their respective categories. The analysis conducted in this study finds that the most favorable value for the hyperparameter k is 12, resulting in a highly effective diagnostic outcome from administering four distinct tests. Given the absence of a predefined value for the k parameter, guesswork could lead to accuracy errors and misdiagnosis; therefore, employing k-FCV provides a more precise approach to determining the optimal class for unknown tumor attributes. Additionally, preprocessing of this dataset and measuring how different data splits impact accuracy are used to organize the data effectively and achieve reliable results. Recognizing that early detection is essential in preventing Breast Cancer-related deaths, ML techniques like kNN can greatly reduce mortality rates associated with the disease.
Downloads
References or Bibliography
Preventing cancer. (n.d.). World Health Organization (WHO). Retrieved July 10, 2023, from ‘WHO | Breast cancer’, WHO. http://www.who.int/cancer/prevention/diagnosis-screening/breast-cancer/en/ (accessed Feb. 18, 2020).
Rafid, A. K. M. R. H., Azam, S., Montaha, S., Karim, A., Fahim, K. U., & Hasan, M. Z. (2022, November 11). An Effective Ensemble Machine Learning Approach to Classify Breast Cancer Based on Feature Selection and Lesion Segmentation Using Preprocessed Mammograms. NCBI. Retrieved July 11, 2023, from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9687739/
Abdulla, S. H., Sagheer, A. M., & Veisi, H. (2021, August 14). 1979Breast Cancer Classification Using Machine Learning Techniques: A Review. View of Breast Cancer Classification Using Machine Learning Techniques: A Review. Retrieved July 10, 2023, from Abdulla, S. H., Sagheer, A. M., & Veisi, H. (2021, August 19). Breast Cancer Classification Using Machine Learning Techniques: A Review. urkish Journal of Computer and Mathematics Education. Retrieved June 29, 2023, from https://turcomat.org/index.php/turkbilmat/article/view/10604/8162
Ehsani1, R., & Drabløs, F. (2020, September 19). Robust Distance Measures for kNN Classification of Cancer Data. Cancer Informatics. Retrieved July 10, 2023, from Ehsani, R., & Drabløs, F. (2020, September 19). Robust Distance Measures for kNN Classification of Cancer Data. Cancer Informatics. Retrieved June 30, 2023, from https://journals.sagepub.com/doi/pdf/10.1177/1176935120965542
Bolandraftar, M., & Imandoust, S. B. (2017, December 7). Application of K-nearest neighbor (KNN) approach for predicting economic events theoretical background. ResearchGate. Retrieved July 10, 2023, from Imandoust, S. B., & Bolandraftar, M. (2013). Application of K-nearest neighbor (KNN) approach for predicting ... International Journal of Engineering Research and Applications. https://www.researchgate.net/profile/Mohammad-Bolandraftar/publication/304826093_Application_of_K-nearest_neighbor_KNN_approach_for_predicting_economic_events_theoretical_background/links/5a296efba6fdccfbbf816edf/Application-of-K-nearest-neighbor-KNN-approach-for-predicting-economic-events-theoretical-background.pdf
Wettschereck, D., Aha, D. W., & Mohri, T. (n.d). A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms. Citeseerx. Retrieved July 10, 2023, from https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=5675f05a2e10e436218a0432678cb0416e606306
Ajanki, A. (2007, May 28). File:KnnClassification.svg. Wikimedia Commons. Retrieved July 11, 2023, from https://commons.wikimedia.org/wiki/File:KnnClassification.svg
Li, Y., & Zhang, X. (2011). Improving k Nearest Neighbor with Exemplar Generalization for Imbalanced Classification. Springer Link. Retrieved July 10, 2023, from https://link.springer.com/chapter/10.1007/978-3-642-20847-8_27
James, G., Witten, D., Hastie, T., & Tibshirani, R. (n.d, n.d n.d). Corrected 7th Printing. Squarespace. Retrieved July 28, 2023, from https://static1.squarespace.com/static/5ff2adbe3fe4fe33db902812/t/6062a083acbfe82c7195b27d/1617076404560/ISLR%2BSeventh%2BPrinting.pdf
Asri, H., Mousannif, H., Moatassime, H. A., & Noel, T. (2016). Using Machine Learning Algorithms for Breast Cancer Risk Prediction and Diagnosis. Procedia Computer Science, 83, 1064-1069. Retrieved July 10, 2023, from Asri, H., Mousannif, H., Moatassime, H. A., & Noel, T. (2016). Using Machine Learning Algorithms for Breast Cancer Risk Prediction and Diagnosis. Procedia Computer Science, 83, 1064-1069. https://doi.org/10.1016/j.procs.2016.04.224
Kharya, S. (2015). BREAST CANCER DIAGNOSIS AND RECURRENCE PREDICTION USING MACHINE LEARNING TECHNIQUES. IJRET. Retrieved July 10, 2023, from https://ijret.org/volumes/2015v04/i04/IJRET20150404066.pdf
Shah, C., & Jivani's, A. G. (2015, July 22). (PDF) Comparison of data mining classification algorithms for breast cancer prediction. ResearchGate. Retrieved July 10, 2023, from https://www.researchgate.net/publication/269270867_Comparison_of_data_mining_classification_algorithms_for_breast_cancer_prediction
Amrane, M., Oukid, S., Gagaoua, I., & Ensarİ, T. (2018). Breast cancer classification using machine learning. IEEE Xplore. Retrieved July 10, 2023, from M. Amrane, S. Oukid, I. Gagaoua and T. Ensarİ, "Breast cancer classification using machine learning," 2018 Electric Electronics, Computer Science, Biomedical Engineerings' Meeting (EBBT), Istanbul, Turkey, 2018, pp. 1-4, doi: 10.1109/EBBT.2018.8391453.
Tembusai, Z. R., Mawengkang, H., & Zarlis, M. (2021, January 11). K-Nearest Neighbor with K-Fold Cross Validation and Analytic Hierarchy Process on Data Classification | International Journal of Advances in Data and Information Systems. ijadis. Retrieved July 10, 2023, from http://www.ijadis.org/index.php/IJADIS/article/view/k-nearest-neighbor-with-k-fold-cross-validation-and-analytic-hie
Machine Learning, U. (2016, September 25). Breast Cancer Wisconsin (Diagnostic) Data Set. Kaggle. Retrieved July 10, 2023, from Learning, U. M. (2016, September 25). Breast cancer wisconsin (diagnostic) data set. Kaggle. https://www.kaggle.com/datasets/uciml/breast-cancer-wisconsin-data
Alfeilat, H. A., Hassanat, A. B. A., Lasassmeh, O., Tarawneh, A. S., Alhasanat, M. B., Salman, H. S. E., & Prasath, V. B. S. (2019, December 7). Effects of Distance Measure Choice on K-Nearest Neighbor Classifier Performance: A Review. PubMed. Retrieved July 10, 2023, from Lewis, H. G., & Brown, M. (2010, November 25). A generalized confusion matrix for assessing area estimates from remotely sensed data. Taylor & Francis Online. Retrieved July 10, 2023, from https://www.tandfonline.com/doi/epdf/10.1080/01431160152558332?needAccess=true
Lewis, H. G., & Brown, M. (2010, November 25). A generalized confusion matrix for assessing area estimates from remotely sensed data. Taylor & Francis Online. Retrieved July 10, 2023, from https://www.tandfonline.com/doi/epdf/10.1080/01431160152558332?needAccess=true
n.d. (n.d.). Margin of Error - Definition, Usage, and Calculator. Zoho. Retrieved July 11, 2023, from https://www.zoho.com/survey/margin-of-error.html
Henderi, H., Wahyuningsih, T., & Rahwanto, E. (2021, March 1). Comparison of Min-Max normalization and Z-Score Normalization in the K-nearest neighbor (kNN) Algorithm to Test the Accuracy of Types of Breast Cancer | Henderi. International Journal of Informatics and Information Systems. Retrieved July 10, 2023, from http://ijiis.org/index.php/IJIIS/article/view/73
Wong, T. T., & Yeh, P. Y. (2020, August 1). Reliable Accuracy Estimates from k-Fold Cross Validation. Research NCKU. Retrieved July 11, 2023, from https://researchoutput.ncku.edu.tw/en/publications/reliable-accuracy-estimates-from-k-fold-cross-validation
Published
How to Cite
Issue
Section
Copyright (c) 2023 Sahasra Chatakondu; Kevin Zhai
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright holder(s) granted JSR a perpetual, non-exclusive license to distriute & display this article.