Detecting and Validating the Emotions in Alzheimer’s Patients by Voice Analysis, Computer Vision and Deep Learning
DOI:
https://doi.org/10.47611/jsrhs.v13i1.6303Keywords:
Alzheimer's disease, Mood detection, Deep learning, Convolutional Neural Network (CNN), Neurodegenerative disorder, Cognitive declineAbstract
Alzheimer’s disease is a degenerative disorder of the brain that affects memory and cognitive function, and is becoming more prevalent as the population ages. Alzheimer causes brain cells of a person to die, and with time the brain works less. As a result of this there is a change in the behavior and personality of Alzheimer patients. It is observed that the patients very often suffer from fluctuating mood. Due to the change in mood of the patient, the caretaker or the attendant of the patient is unable to distinguish what triggered the changes, and provide proactive support or timely intervention. Hence the study is undertaken to detect the mood of the Alzheimer’s patient employing the voice analysis, computer vision and deep learning. This paper comprises of two phases. The first phase detects the emotions of the patients in real time with the help of computer vision (CV), voice analysis (VA) and Convolutional Neural Network (CNN). Two CNN models were trained, first with the attributes extracted from the image and second the features extracted from the voice dataset. Then the predictions are compared to get the result from the proposed model. The second phase of the paper examines the practicality of the proposed approach by applying it to detect the emotions in four Alzheimer’s patients. Finally, the results are compared and validated. Overall, the proposed model holds promise as a valuable tool in the real time detection of emotions in Alzheimer patient, enabling timely intervention and improved patient outcomes.
Downloads
References or Bibliography
Mecocci, Patrizia, and Maria Cristina Polidori. Antioxidant clinical trials in mild cognitive impairment and Alzheimer's disease. Biochimica et Biophysica Acta (BBA)-Molecular Basis of Disease 1822.5 (2012): 631-638. https://doi.org/10.1016/j.bbadis.2011.10.006
Hort, J. O. B. J., O’brien, J. T., Gainotti, G., Pirttila, T., Popescu, B. O., Rektorová, I., ... & EFNS Scientist Panel on Dementia. (2010). EFNS guidelines for the diagnosis and management of Alzheimer’s disease. European Journal of Neurology, 17(10), 1236-1248. https://doi.org/10.1111/j.1468-1331.2010.03040.x
Leung, D. K., Chan, W. C., Spector, A., & Wong, G. H. (2021). Prevalence of depression, anxiety, and apathy symptoms across dementia stages: A systematic review and meta‐analysis. International Journal of Geriatric Psychiatry, 36(9), 1330-1344. DOI:10.3233/JAD-200739
Fraser, K. C., Meltzer, J. A., & Rudzicz, F. (2016). Linguistic features identify Alzheimer’s disease in narrative speech. Journal of Alzheimer's Disease, 49(2), 407-422. DOI:10.3233/JAD-150520
Al-Hameed, S., Benaissa, M., & Christensen, H. (2016, September). Simple and robust audio-based detection of biomarkers for Alzheimer’s disease. In 7th Workshop on Speech and Language Processing for Assistive Technologies (SLPAT) (pp. 32-36). DOI:10.21437/SLPAT.2016-6
Fraser, K. C., Meltzer, J. A., & Rudzicz, F. (2016). Linguistic features identify Alzheimer’s disease in narrative speech. Journal of Alzheimer's Disease, 49(2), 407-422. DOI:10.3233/JAD-150520
El-Sappagh, S., Alonso, J. M., Islam, S. M., Sultan, A. M., & Kwak, K. S. (2021). A multilayer multimodal detection and prediction model based on explainable artificial intelligence for Alzheimer’s disease. Scientific reports, 11(1), 1-26. DOI:10.1038/s41598-021-82098-3
Clark, H. H., & Tree, J. E. F. (2002). Using uh and um in spontaneous speaking. Cognition, 84(1), 73-111.
Lee, T. R., Kim, G. H., & Choi, M. T. (2022). Identification of Geriatric Depression and Anxiety Using Activity Tracking Data and Minimal Geriatric Assessment Scales. Applied Sciences, 12(5), 2488 DOI:10.3390/app12052488
Li, J., Deng, L., Haeb-Umbach, R., & Gong, Y. (2015). Robust automatic speech recognition: a bridge to practical applications. DOI:10.1109/TASLP.2022.3153265
Shen, P., Changjun, Z., & Chen, X. (2011, August). Automatic speech emotion recognition using support vector machine. In Proceedings of 2011 international conference on electronic & mechanical engineering and information technology (Vol. 2, pp. 621-625). IEEE. DOI:10.1109/EMEIT.2011.6023178
Kim, J., Kim, S., Kong, J., & Yoon, S. (2020). Glow-tts: A generative flow for text-to-speech via monotonic alignment search. Advances in Neural Information Processing Systems, 33, 8067-8077. DOI:10.3390/app12052490
Yamashita, R., Nishio, M., Do, R. K. G., & Togashi, K. (2018). Convolutional neural networks: an overview and application in radiology. Insights into imaging, 9, 611-629. DOI:10.1007/s13244-018-0639-9
Livingstone, S. R., & Russo, F. A. (2018). The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PloS one, 13(5), e0196391. DOI:10.1371/journal.pone.0196391
Jackson, P., & Haq, S. (2014). Surrey audio-visual expressed emotion (savee) database. University of Surrey: Guildford, UK.FER-2013, https://www.kaggle.com/datasets/msambare/fer2013 Extracted on 20th February 2023. DOI:10.20944/preprints202309.1202.v1
Yamashita, R., Nishio, M., Do, R. K. G., & Togashi, K. (2018). Convolutional neural networks: an overview and application in radiology. Insights into imaging, 9, 611-629. https://doi.org/10.1007/s13244-018-0639-9
Shyam, R. (2021). Convolutional neural network and its architectures. Journal of Computer Technology & Applications, 12(2), 6-14p. DOI:10.37591/JoCTA
Cen, L. P., Ji, J., Lin, J. W., Ju, S. T., Lin, H. J., Li, T. P., ... & Zhang, M. (2021). Automatic detection of 39 fundus diseases and conditions in retinal photographs using deep neural networks. Nature communications, 12(1), 4828. DOI:10.1038/s41467-021-25138-w
Dabiri, S., & Heaslip, K. (2018). Inferring transportation modes from GPS trajectories using a convolutional neural network. Transportation research part C: emerging technologies, 86, 360-371. https://doi.org/10.1016/j.trc.2017.11.021
Mishra, A., & Shukla, A. (2020, December). Psychological determinants of consumer’s usage, satisfaction, and word-of-mouth recommendations toward smart voice assistants. In Re-imagining Diffusion and Adoption of Information Technology and Systems: A Continuing Conversation: IFIP WG 8.6 International Conference on Transfer and Diffusion of IT, TDIT 2020, Tiruchirappalli, India, December 18–19, 2020, Proceedings, Part I (pp. 274-283). Cham: Springer International Publishing. DOI:10.1007/978-3-030-64849-7_24
ŞEN, S. Y., & ÖZKURT, N. (2020, October). Convolutional neural network hyperparameter tuning with adam optimizer for ECG classification. In 2020 Innovations in Intelligent Systems and Applications Conference (ASYU) (pp. 1-6). IEEE. DOI:10.1109/ASYU50717.2020.9259896
He, X. S., He, F., & Cai, W. H. (2016). Underdetermined BSS based on K-means and AP clustering. Circuits, Systems, and Signal Processing, 35, 2881-2913. DOI:10.1007/s00034-015-0173-7
Published
How to Cite
Issue
Section
Copyright (c) 2024 Reetu Jain, Ms
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright holder(s) granted JSR a perpetual, non-exclusive license to distriute & display this article.