Using Deep Learning to Understand and Model how a Virtual Assistant, like Siri, knows when to Act
DOI:
https://doi.org/10.47611/jsrhs.v12i4.5243Keywords:
machine learning, audio processing, audio signal classification, convolutional neural networks, virtual assistantsAbstract
In the era of technology, virtual assistants are all around us and have changed the way we interact with technology. To better understand the inner workings of virtual assistants, we visualized and demonstrated one way that mimics the audio classification techniques of virtual assistants by developing a deep convolutional neural network (DCNN) trained on mel spectrograms to classify audio. Our hypothesis is that mel spectrograms of the wake and non-wake words can be used to accurately classify audio. Out of the 85 files in our dataset, our classifier was trained and validated on 58 files of data and tested on 27 files of data. When evaluating our test performance, our model achieved a value of 1 for precision, recall and accuracy. Our classifier achieved a 100% accuracy in classifying wake words and non-wake words.
Downloads
References or Bibliography
Renotte, Nicholas, director. Build a Deep CNN Image Classifier with ANY Images. YouTube, YouTube, 25 Apr. 2022, https://youtube.com/watch?v=jztwpsIzEGc&t=0s.
“Hey Siri: An on-Device DNN-Powered Voice Trigger for Apple’s Personal Assistant.” Apple Machine Learning Research, https://machinelearning.apple.com/research/hey-siri.
Unsupervised Feature Learning for Audio Classification Using ... - Neurips, https://proceedings.neurips.cc/paper_files/paper/2009/file/a113c1ecd3cace2237256f4c712f61b5-Paper.pdf.
Nanni, Loris, et al. “An Ensemble of Convolutional Neural Networks for Audio Classification.” MDPI, 22 June 2021, https://www.mdpi.com/2076-3417/11/13/5796.
Nanni, Loris, Yandre M. G. Costa, et al. “Ensemble of Convolutional Neural Networks to Improve Animal Audio Classification - EURASIP Journal on Audio, Speech, and Music Processing.” SpringerOpen, 26 May 2020, https://asmp-eurasipjournals.springeropen.com/articles/10.1186/s13636-020-00175-3.
McLaughlin, Molly. “What Is a Virtual Assistant and How Does It Work?” Lifewire, 5 Aug. 2021, www.lifewire.com/virtual-assistants-4138533.
Doshi, Ketan. “Audio Deep Learning Made Simple: Automatic Speech Recognition (ASR), How It Works.” Medium, 25 May 2021, https://towardsdatascience.com/audio-deep-learning-made-simple-automatic-speech-recognition-asr-how-it-works-716cfce4c706.
Doshi, Ketan. “Foundations of NLP Explained Visually: Beam Search, How It Works.” Medium, 21 May 2021, https://towardsdatascience.com/foundations-of-nlp-explained-visually-beam-search-how-it-works-1586b9849a24.
Doshi, Ketan. “Audio Deep Learning Made Simple: Sound Classification, Step-by-Step.” Medium, 21 May 2021, https://towardsdatascience.com/audio-deep-learning-made-simple-sound-classification-step-by-step-cebc936bbe5.
Doshi, Ketan. “Audio Deep Learning Made Simple (Part 1): State-of-the-Art Techniques.” Medium, 21 May 2021, https://towardsdatascience.com/audio-deep-learning-made-simple-part-1-state-of-the-art-techniques-da1d3dff2504.
https://www.mdpi.com/sensors/sensors-22-01521/article_deploy/html/images/sensors-22-01521-g001.png
https://ars.els-cdn.com/content/image/3-s2.0-B9780128188330000096-f09-03-9780128188330.jpg
Published
How to Cite
Issue
Section
Copyright (c) 2023 Shravan Devraj; Ross Greer
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright holder(s) granted JSR a perpetual, non-exclusive license to distriute & display this article.