Deep Neural Network Classifier for Alzheimer’s Disease
Omics biomarker prediction for early and quantitative Alzheimer's Disease diagnosis
DOI:
https://doi.org/10.47611/jsrhs.v11i3.3553Keywords:
Alzheimer's Disease, Deep Neural Network, Machine Learning, Omics Datasets, Alzheimer's Disease Diagnosis, Gene ExpressionAbstract
Alzheimer's disease (AD) is a neurodegenerative disease characterized by dementia and, eventually, a loss of cognitive abilities. Two histopathological features are associated with AD, neurofibrillary tangles, and amyloid-beta plaque. Both contribute to neuron cell death, neuron dysfunction, and AD pathogenesis. Current methods to diagnose AD remain reliant on symptomatic diagnosis with interviews that can be time-consuming, costly, and inaccurate. Alternative methods such as brain imaging are expensive and require extensive laboratory setup for accurate results. Thus molecular-level quantitative approaches are necessary. Omics datasets and machine learning technology advancements have opened new avenues to diagnose AD. This paper proposes using statistical methods such as principal component analysis, t-distributed stochastic neighbor embedding, and Kolmogorov-Smirnov test combined with Benjamini-Hochberg correction through feature selection and dimensionality reduction to isolate significant features associated with AD. Furthermore, we developed machine learning models based on logistic regression, random forest classifier, and deep neural network (DNN) classifier to predict AD diagnosis. Eight unique genes (TGM2, NKIRAS1, SYK, GABARAPL2, ABCC12, NDEL1, TEP1) were identified as significant biomarkers of AD and confirmed previous works identifying prognoses' roles in AD. After extensive hyperparameter tuning, the DNN model showed the best prediction performance for AD diagnosis among the three machine learning algorithms. The DNN model and preprocessed dataset demonstrated a 5-fold cross-validation accuracy of 0.823 and AUC-ROC of 0.940. Its code is publicly available at https://www.kaggle.com/neobrando/ml-dnn.
Downloads
References or Bibliography
Alzheimer's Association. (2021). 2021 Alzheimer’s disease facts and figures. Alzheimer’s & Dementia, 17(3). https://doi.org/10.1002/alz.12328
Area-Gomez, E., del Carmen Lara Castillo, M., Tambini, M. D., Guardia-Laguarta, C., de Groof, A. J. C., Madra, M., Ikenouchi, J., Umeda, M., Bird, T. D., Sturley, S. L., & Schon, E. A. (2012). Upregulated function of mitochondria-associated ER membranes in Alzheimer disease. The EMBO Journal, 31(21), 4106–4123. https://doi.org/10.1038/emboj.2012.202
Aykac, A., & Sehirli, A. Ö. (2021). The Function and Expression of ATP-Binding Cassette Transporters Proteins in the Alzheimer’s Disease. Global Medical Genetics, 08(04), 149–155. https://doi.org/10.1055/s-0041-1735541
Barkved, K. (2022, March 9). How To Know if Your Machine Learning Model Has Good Performance | Obviously AI. Www.obviously.ai. https://www.obviously.ai/post/machine-learning-model-performance#:~:text=But%20in%20our%20opinion%2C%20anything
Battineni, G., Chintalapudi, N., Amenta, F., & Traini, E. (2020). A Comprehensive Machine-Learning Model Applied to Magnetic Resonance Imaging (MRI) to Predict Alzheimer’s Disease (AD) in Older Subjects. Journal of Clinical Medicine, 9(7), 2146. https://doi.org/10.3390/jcm9072146
Bekris, L. M., Yu, C.-E., Bird, T. D., & Tsuang, D. W. (2010). Review Article: Genetics of Alzheimer Disease. Journal of Geriatric Psychiatry and Neurology, 23(4), 213–227. https://doi.org/10.1177/0891988710383571
Bellenguez, C., Küçükali, F., Jansen, I. E., Kleineidam, L., Moreno-Grau, S., Amin, N., Naj, A. C., Campos-Martin, R., Grenier-Boley, B., Andrade, V., Holmans, P. A., Boland, A., Damotte, V., van der Lee, S. J., Costa, M. R., Kuulasmaa, T., Yang, Q., de Rojas, I., Bis, J. C., & Yaqub, A. (2022). New insights into the genetic etiology of Alzheimer’s disease and related dementias. Nature Genetics. https://doi.org/10.1038/s41588-022-01024-z
Braak, H., & Braak, E. (1997). Frequency of Stages of Alzheimer-Related Lesions in Different Age Categories. Neurobiology of Aging, 18(4), 351–357. https://doi.org/10.1016/s0197-4580(97)00056-0
Brickell, K. L., Steinbart, E. J., Rumbaugh, M., Payami, H., Schellenberg, G. D., Van Deerlin, V., Yuan, W., & Bird, T. D. (2006). Early-Onset Alzheimer Disease in Families With Late-Onset Alzheimer Disease. Archives of Neurology, 63(9), 1307. https://doi.org/10.1001/archneur.63.9.1307
Caberlotto, L., Nguyen, T.-P., Lauria, M., Priami, C., Rimondini, R., Maioli, S., Cedazo-Minguez, A., Sita, G., Morroni, F., Corsi, M., & Carboni, L. (2019). Cross-disease analysis of Alzheimer’s disease and type-2 Diabetes highlights the role of autophagy in the pathophysiology of two highly comorbid diseases. Scientific Reports, 9(1), 3965. https://doi.org/10.1038/s41598-019-39828-5
Campion, D., Dumanchin, C., Hannequin, D., Dubois, B., Belliard, S., Puel, M., Thomas-Anterion, C., Michon, A., Martin, C., Charbonnier, F., Raux, G., Camuzat, A., Penet, C., Mesnage, V., Martinez, M., Clerget-Darpoux, F., Brice, A., & Frebourg, T. (1999). Early-Onset Autosomal Dominant Alzheimer Disease: Prevalence, Genetic Heterogeneity, and Mutation Spectrum. The American Journal of Human Genetics, 65(3), 664–670. https://doi.org/10.1086/302553
Carrington, A. M., Manuel, D. G., Fieguth, P. W., Ramsay, T., Osmani, V., Wernly, B., Bennett, C., Hawken, S., McInnes, M., Magwood, O., Sheikh, Y., & Holzinger, A. (2022). Deep ROC Analysis and AUC as Balanced Average Accuracy to Improve Model Selection, Understanding and Interpretation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–1. https://doi.org/10.1109/TPAMI.2022.3145392
Carter, J., & Lippa, C. (2001). β-Amyloid, Neuronal Death and Alzheimers Disease. Current Molecular Medicine, 1(6), 733–737. https://doi.org/10.2174/1566524013363177
Couronné, R., Probst, P., & Boulesteix, A.-L. (2018). Random forest versus logistic regression: a large-scale benchmark experiment. BMC Bioinformatics, 19(1). https://doi.org/10.1186/s12859-018-2264-5
D’Eletto, M., Rossin, F., Occhigrossi, L., Farrace, M. G., Faccenda, D., Desai, R., Marchi, S., Refolo, G., Falasca, L., Antonioli, M., Ciccosanti, F., Fimia, G. M., Pinton, P., Campanella, M., & Piacentini, M. (2018). Transglutaminase Type 2 Regulates ER-Mitochondria Contact Sites by Interacting with GRP75. Cell Reports, 25(13), 3573-3581.e4. https://doi.org/10.1016/j.celrep.2018.11.094
Zhang B, Gaiteri C, Bodea LG, Wang Z et al. Integrated systems approach identifies genetic nodes and networks in late-onset Alzheimer's disease. Cell 2013 Apr 25;153(3):707-20. PMID: 23622250
Du, P., Zhang, X., Huang, C.-C., Jafari, N., Kibbe, W. A., Hou, L., & Lin, S. M. (2010). Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics, 11(1). https://doi.org/10.1186/1471-2105-11-587
Feng, Y., Li, X., Zhou, W., Lou, D., Huang, D., Li, Y., Kang, Y., Xiang, Y., Li, T., Zhou, W., & Song, W. (2017). Regulation of SET Gene Expression by NFkB. Molecular Neurobiology, 54(6), 4477–4485. https://doi.org/10.1007/s12035-016-9967-2
Genecard. (n.d.-a). FAM131A Gene - GeneCards | F131A Protein | F131A Antibody. Www.genecards.org. Retrieved August 3, 2022, from https://www.genecards.org/cgi-bin/carddisp.pl?gene=FAM131A#diseases
Genecard. (n.d.-b). FAM234B Gene - GeneCards | F234B Protein | F234B Antibody. Www.genecards.org. https://www.genecards.org/cgi-bin/carddisp.pl?gene=FAM234B&keywords=KIAA1467#diseases
Genecard. (n.d.-c). NKIRAS1 Gene - GeneCards | KBRS1 Protein | KBRS1 Antibody. Www.genecards.org. Retrieved August 3, 2022, from https://www.genecards.org/cgi-bin/carddisp.pl?gene=NKIRAS1
Ghosh, S., & Geahlen, R. L. (2015). Stress Granules Modulate SYK to Cause Microglial Cell Dysfunction in Alzheimer’s Disease. EBioMedicine, 2(11), 1785–1798. https://doi.org/10.1016/j.ebiom.2015.09.053
Goedert, M., & Spillantini, M. G. (2006). A century of Alzheimer’s disease. Science (New York, N.Y.), 314(5800), 777–781. https://doi.org/10.1126/science.1132814
Google. (2019). Classification: Accuracy | Machine Learning Crash Course. Google Developers. https://developers.google.com/machine-learning/crash-course/classification/accuracy
Hajian-Tilaki, K. (2013). Receiver Operating Characteristic (ROC) Curve Analysis for Medical Diagnostic Test Evaluation. Caspian Journal of Internal Medicine, 4(2), 627–635. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3755824/
Harrington, L., McPhail, T., Mar, V., Zhou, W., Oulton, R., Program, A. E., Bass, M. B., Arruda, I., & Robinson, M. O. (1997). A Mammalian Telomerase-Associated Protein. Science, 275(5302), 973–977. https://doi.org/10.1126/science.275.5302.973
Hasin, Y., Seldin, M., & Lusis, A. (2017). Multi-omics approaches to disease. Genome Biology, 18(1). https://doi.org/10.1186/s13059-017-1215-1
IBM. (n.d.). What is Logistic regression? | IBM. Www.ibm.com. https://www.ibm.com/topics/logistic-regression#:~:text=Logistic%20regression%20estimates%20the%20probability
Inza, I., Calvo, B., Armañanzas, R., Bengoetxea, E., Larrañaga, P., & Lozano, J. A. (2009). Machine Learning: An Indispensable Tool in Bioinformatics. Methods in Molecular Biology, 25–48. https://doi.org/10.1007/978-1-60327-194-3_2
Iwatsubo, T., Odaka, A., Suzuki, N., Mizusawa, H., Nukina, N., & Ihara, Y. (1994). Visualization of Aβ42(43) and Aβ40 in senile plaques with end-specific Aβ monoclonals: Evidence that an initially deposited species is Aβ42(43). Neuron, 13(1), 45–53. https://doi.org/10.1016/0896-6273(94)90458-8
J. D. Hunter, "Matplotlib: A 2D Graphics Environment," in Computing in Science & Engineering, vol. 9, no. 3, pp. 90-95, May-June 2007, doi: 10.1109/MCSE.2007.55.
Johnson, P., Vandewater, L., Wilson, W., Maruff, P., Savage, G., Graham, P., Macaulay, L. S., Ellis, K. A., Szoeke, C., Martins, R. N., Rowe, C. C., Masters, C. L., Ames, D., & Zhang, P. (2014). Genetic algorithm with logistic regression for prediction of progression to Alzheimer’s disease. BMC Bioinformatics, 15(Suppl 16), S11. https://doi.org/10.1186/1471-2105-15-s16-s11
Kaitlin, Smith, T., & Sadler, B. (2018). Random Forest vs Logistic Regression: Binary Classification for Heterogeneous Datasets. SMU Data Science Review, 1(3), 9. https://scholar.smu.edu/cgi/viewcontent.cgi?article=1041&context=datasciencereview#:~:text=variables%20exceeds%20the%20number%20of
Kobak, D., & Berens, P. (2019). The art of using t-SNE for single-cell transcriptomics. Nature Communications, 10(1). https://doi.org/10.1038/s41467-019-13056-x
Kong, Y., & Yu, T. (2018). A Deep Neural Network Model using Random Forest to Extract Feature Representation for Gene Expression Data Classification. Scientific Reports, 8(1). https://doi.org/10.1038/s41598-018-34833-6
Krohn, M., Lange, C., Hofrichter, J., Scheffler, K., Stenzel, J., Steffen, J., Schumacher, T., Brüning, T., Plath, A.-S., Alfen, F., Schmidt, A., Winter, F., Rateitschak, K., Wree, A., Gsponer, J., Walker, L. C., & Pahnke, J. (2011). Cerebral amyloid-β proteostasis is regulated by the membrane transport protein ABCC1 in mice. Journal of Clinical Investigation, 121(10), 3924–3931. https://doi.org/10.1172/jci57867
Lee, H. J., Jung, Y. H., Choi, G. E., Kim, J. S., Chae, C. W., Lim, J. R., Kim, S. Y., Yoon, J. H., Cho, J. H., Lee, S.-J., & Han, H. J. (2021). Urolithin A suppresses high glucose-induced neuronal amyloidogenesis by modulating TGM2-dependent ER-mitochondria contacts and calcium homeostasis. Cell Death & Differentiation, 28(1), 184–202. https://doi.org/10.1038/s41418-020-0593-1
Mark Schmidt, Nicolas Le Roux, Francis Bach. Minimizing Finite Sums with the Stochastic Average Gradient. Mathematical Programming, Springer Verlag, 2017, 162 (1-2), pp.83-112. ff10.1007/s10107- 016-1030-6ff. Ffhal-00860051v2f
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Rafal Jozefowicz, Yangqing Jia, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Mike Schuster, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.
Ma, S., & Dai, Y. (2011). Principal component analysis based methods in bioinformatics studies. Briefings in Bioinformatics, 12(6), 714–722. https://doi.org/10.1093/bib/bbq090
Narayanan M, Huynh JL, Wang K, Yang X et al. Common dysregulation network in the human prefrontal cortex underlies two neurodegenerative diseases. Mol Syst Biol 2014 Jul 30;10:743. PMID: 25080494
National Institute on Aging. (2017, May 16). What Happens to the Brain in Alzheimer’s Disease? National Institute on Aging. https://www.nia.nih.gov/health/what-happens-brain-alzheimers-disease#:~:text=These%20tangles%20block%20the%20neuron
National Institute on Aging. (2021, July 8). Alzheimer’s Disease Fact Sheet. National Institute on Aging. https://www.nia.nih.gov/health/alzheimers-disease-fact-sheet
Natoli, G. (2009). When Sirtuins and NF-κB Collide. Cell, 136(1), 19–21. https://doi.org/10.1016/j.cell.2008.12.034
Paris, D., Ait-Ghezala, G., Bachmeier, C., Laco, G., Beaulieu-Abdelahad, D., Lin, Y., Jin, C., Crawford, F., & Mullan, M. (2014). The Spleen Tyrosine Kinase (Syk) Regulates Alzheimer Amyloid-β Production and Tau Hyperphosphorylation. The Journal of Biological Chemistry, 289(49), 33927–33944. https://doi.org/10.1074/jbc.M114.608091
Park, C. (2021, March 20). DNN_for_ADprediction/dataset at master · ChihyunPark/DNN_for_ADprediction. GitHub. https://github.com/ChihyunPark/DNN_for_ADprediction/tree/master/dataset
Park, C., Ha, J., & Park, S. (2020). Prediction of Alzheimer’s disease based on deep neural network by integrating gene expression and DNA methylation dataset. Expert Systems with Applications, 140, 112873. https://doi.org/10.1016/j.eswa.2019.112873
Piller, C. (2022, July 21). Potential fabrication in research images threatens key theory of Alzheimer’s disease. Www.science.org. https://www.science.org/content/article/potential-fabrication-research-images-threatens-key-theory-alzheimers-disease
Pereira, C. D., Martins, F., Wiltfang, J., da Cruz e Silva, O. A. B., & Rebelo, S. (2017). ABC Transporters Are Key Players in Alzheimer’s Disease. Journal of Alzheimer’s Disease, 61(2), 463–485. https://doi.org/10.3233/jad-170639
Plotly. (n.d.). Plotly Python Graphing Library. Plotly.com. https://plotly.com/python/
Rogers, A., & Weiss, S. (2017). False Discovery Rate - an overview | ScienceDirect Topics. Www.sciencedirect.com. https://www.sciencedirect.com/topics/neuroscience/false-discovery-rate
Sancesario, G. M., & Bernardini, S. (2018). Alzheimer’s disease in the omics era. Clinical Biochemistry, 59, 9–16. https://doi.org/10.1016/j.clinbiochem.2018.06.011
Scipy. (n.d.). scipy.stats.ks_2samp — SciPy v1.9.0 Manual. Docs.scipy.org. Retrieved August 2, 2022, from https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ks_2samp.html#scipy.stats.ks_2samp
Sklearn. (2014). sklearn.manifold.TSNE — scikit-learn 0.21.3 documentation. Scikit-Learn.org. https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html
Sklearn. (2018). 3.2.4.3.2. sklearn.ensemble.RandomForestRegressor — scikit-learn 0.20.3 documentation. Scikit-Learn.org. https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html
sklearn.decomposition.PCA — scikit-learn 0.20.3 documentation. (2009). Scikit-Learn.org. https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
Smith RG, Hannon E, De Jager PL, Chibnik L et al. Elevated DNA methylation across a 48-kb region spanning the HOXA gene cluster is associated with Alzheimer's disease neuropathology. Alzheimers Dement 2018 Dec;14(12):1580-1588. PMID: 29550519
Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian Optimization of Machine Learning Algorithms. Neural Information Processing Systems; Curran Associates, Inc. https://papers.nips.cc/paper/2012/hash/05311655a15b75fab86956663e1819cd-Abstract.html
Statsmodels. (2019). statsmodels.stats.multitest.fdrcorrection — statsmodels. Www.statsmodels.org. https://www.statsmodels.org/stable/generated/statsmodels.stats.multitest.fdrcorrection.html
Su, Q., Wang, Y., Jiang, X., Chen, F., & Lu, W. (2017). A Cancer Gene Selection Algorithm Based on the K-S Test and CFS. BioMed Research International, 2017, 1–6. https://doi.org/10.1155/2017/1645619
Tabe-Bordbar, S., Emad, A., Zhao, S. D., & Sinha, S. (2018). A closer look at cross-validation for assessing the accuracy of gene regulatory networks and models. Scientific Reports, 8. https://doi.org/10.1038/s41598-018-24937-4
University of Southern California. (2006, February 7). Alzheimer’s Found To Be Mostly Genetic: Largest Twin Study Ever Undertaken Confirms Highest Estimates Of Genetic Risk. ScienceDaily. http://www.sciencedaily.com/releases/2006/02/060206232300.htm
van Driel, M. A., & Brunner, H. G. (2006). Bioinformatics methods for identifying candidate disease genes. Human Genomics, 2(6), 429. https://doi.org/10.1186/1479-7364-2-6-429
Vastrad, B., & Vastrad, C. (2021). Bioinformatics analyses of significant genes, related pathways and candidate prognostic biomarkers in Alzheimer’s disease. https://doi.org/10.1101/2021.05.06.442918
Weidberg, H., Shvets, E., & Elazar, Z. (2011). Biogenesis and Cargo Selectivity of Autophagosomes. Annual Review of Biochemistry, 80(1), 125–156. https://doi.org/10.1146/annurev-biochem-052709-094552
Wikgren, M. et al. APOE epsilon4 is associated with longer telomeres, and longer telomeres among epsilon4 carriers predicts worse episodic memory. Neurobiol. Aging (2010). doi:10.1016/j.neurobiolaging.2010.03.004
Wilhelm, Jochen. (2021). Re: Can logistic regression be used as the initial baseline or something to start with for any data classification system?. Retrieved from: https://www.researchgate.net/post/Can_logistic_regression_be_used_as_the_initial_baseline_or_something_to_start_with_for_any_data_classification_system/60fc0ec263ef9768526143fe/citation/download.
Yang, H., Wang, H., Shu, Y., & Li, X. (2018). miR-103 Promotes Neurite Outgrowth and Suppresses Cells Apoptosis by Targeting Prostaglandin-Endoperoxide Synthase 2 in Cellular Models of Alzheimer’s Disease. Frontiers in Cellular Neuroscience, 12, 91. https://doi.org/10.3389/fncel.2018.00091
Yu, W., Yu, W., Yang, Y., & Lü, Y. (2021). Exploring the Key Genes and Identification of Potential Diagnosis Biomarkers in Alzheimer’s Disease Using Bioinformatics Analysis. Frontiers in Aging Neuroscience, 13. https://doi.org/10.3389/fnagi.2021.602781
Zhang, X.-H., Jin, G.-H., Li, W., Wang, S.-S., Shan, B.-Q., Qin, J.-B., Zhao, H.-Y., Tian, M.-L., He, H., & Cheng, X. (2022). miR-103-3p targets Ndel1 to regulate neural stem cell proliferation and differentiation. Neural Regeneration Research, 17(2), 401. https://doi.org/10.4103/1673-5374.317987
Zhu, H., Fu, W. & Mattson, M. P. The Catalytic Subunit of Telomerase Protects Neurons Against Amyloid β-Peptide-Induced Apoptosis. J. Neurochem. 75, 117–124 (2001).Google Scholar
Published
How to Cite
Issue
Section
Copyright (c) 2022 Jason Lin; Dr. Hayan Lee, Dr. Michael Snyder
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright holder(s) granted JSR a perpetual, non-exclusive license to distriute & display this article.