Detecting Emotions in Audio Data of Patients with Post Traumatic Stress Disorder using Convolutional Neural Networks

Authors

  • Rohan Gupta Gunn High School

DOI:

https://doi.org/10.47611/jsrhs.v12i3.4776

Keywords:

Speech Recognition, RESNET, MFCC, Mel Spectrogram

Abstract

As humans, we have an effortless ability and a high accuracy to identify another human's emotions through the tone, pitch and pace of their speech, even the emphasis and stress placed on each word. However, people with a traumatic experience suffering from Post Traumatic Stress Disorder (PTSD), 24.4 million people in the USA, can often repress emotions  making it difficult for therapists to identify their patients genuine emotions and to treat them appropriately. Using an upcoming field of emotion detection in Artificial Intelligence (A.I.), I identified the human emotions from speech. Instead of using an audio transcription based model, I opted for a newer image based model, RESNET-18, which is widely used and utilizes spectrograms to preserve the subtleties in speech, critical in distinguishing emotions. To train the model, I used the RAVEDESS dataset which consists of wav files with eight different emotions. I was able to achieve an overall accuracy of 82% (greater than human detection by 25%). Specifically, I achieved 99% for no stress class (happiness), 97% for neutral class, (neutral, calm, and surprised), and 8\5% for stressed class (fearful, sadness, anger, disgust). I also found that the model got an accuracy of 87% when only trained on males, with continued training an overall accuracy above 90% is definitely achievable. In conclusion, it is possible to find the emotions of PTSD patients, and in the future, continued research can help improve the lives of people who are not able to express their true emotions.

Downloads

Download data is not yet available.

References or Bibliography

M. P. B. Andrew Huang, “Human vocal sentiment analysis,” arXiv

preprint arXiv:1905.08632v1, 2019.

M. B. Lindasalwa Muda and I. Elamvazuthi, “Voice recogni-

tion algorithms using mel frequency cepstral coefficient (mfcc)

and dynamic time warping (dtw) techniques,” arXiv preprint

arXiv:1003.4083, 2010.

S. T. Z. S. Ameya Ajit Mande, Sukrut Dani, “Emotion detection

using audio data samples,” ijarcs ijarcs: v10i6.6489, 2019.

I. T. Meftah, “Emotion recognition using knn classification for user

modeling and sharing of affect states.”

K. He, “Deep residual learning for image recognition.”

K. Tomba, J. Dumoulin, E. Mugellini, O. Abou Khaled, and

S. Hawila, “Stress detection through speech analysis,” Proceed-

ings of the 15th International Joint Conference on e-Business and

Telecommunications, 2018.

J. Liang, “Image classification based on resnet,” Journal of Physics:

Conference Series, vol. 1634, p. 012110, 2020.

A. Hassan and R. Damper, “Multi-class and hierarchical svms for

emotion recognition,” Proceedings of the 11th Annual Conference

of the International Speech Communication Association, INTER-

SPEECH 2010, pp. 2354–2357, 01 2010.

Published

08-31-2023

How to Cite

Gupta, R. (2023). Detecting Emotions in Audio Data of Patients with Post Traumatic Stress Disorder using Convolutional Neural Networks. Journal of Student Research, 12(3). https://doi.org/10.47611/jsrhs.v12i3.4776

Issue

Section

HS Research Articles