Denoising Speech Signals with Hifi-Coulomb-GANs

Authors

  • Anirudh Satheesh Thomas Jefferson High School for Science and Technology
  • Karthick Muthu-Manivannan

DOI:

https://doi.org/10.47611/jsrhs.v11i3.3501

Keywords:

CoulombGANs, Speech denoising, Generative Adversarial Networks, PostNet

Abstract

Recorded speech signals often contain noise that affects the quality of the signal and reduces intelligibility. Several studies have used Generative Adversarial Networks (GANs) to remove noise artifacts and improve speech intelligibility. However, GANs can suffer from gradient vanishing or gradient explosion that can reduce their effectiveness in denoising. To mitigate gradient vanishing, we applied the CoulombGAN architecture to speech denoising using a model structure similar to Hifi-GAN, the current state of the art speech denoiser. We call this new model Hifi-CoGAN. We used a WaveNet generator to denoise signals, a PostNet for general cleanup, and a Multi-Resolution Discriminator to evaluate the signal quality relative to the clean signal. Our results show that Hifi-CoGAN was able to outperform Hifi-GAN in many of the narrowband signals (signals with a limited range of frequencies) in terms of the Short-Term Objective Intelligibility (STOI) and Perceptual Evaluation of Speech Quality (PESQ) metrics. However, the model did not perform as well as Hifi-GAN with wideband noise signals (signals with a wider range of frequencies) such as white noise, so future work must be done to improve the model for these noise signals.

Downloads

Download data is not yet available.

References or Bibliography

J. Ortega-Garcia and J. Gonzalez-Rodriguez, "Overview of speech enhancement techniques for automatic speaker recognition," Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96, 1996, pp. 929-932 vol.2, doi: 10.1109/ICSLP.1996.607754.

Yang LP, Fu QJ. Spectral subtraction-based speech enhancement for cochlear implant patients in background noise. J Acoust Soc Am. 2005 Mar;117(3 Pt 1):1001-4. doi: 10.1121/1.1852873. PMID: 15806989.

Upadhyay, Navneet & Karmakar, Abhijit. (2015). Speech Enhancement using Spectral Subtraction-type Algorithms: A Comparison and Simulation Study. Procedia Computer Science. 54. 574-584.

1016/j.procs.2015.06.066.

T. Biswas, C. Pal, S. B. Mandal and A. Chakrabarti, "Audio de-noising by spectral subtraction technique implemented on reconfigurable hardware," 2014 Seventh International Conference on Contemporary Computing (IC3), 2014, pp. 236-241, doi: 10.1109/IC3.2014.6897179.

Wahlberg, P., & Schreier, P. J. (2010). On wiener filtering of certain locally stationary stochastic processes. Signal Processing, 90(3), 885-890. https://doi.org/10.1016/j.sigpro.2009.09.013

M. Coto-Jimenez, J. Goddard-Close, L. Di Persia and H. Leonardo Rufiner, "Hybrid Speech Enhancement with Wiener filters and Deep LSTM Denoising Autoencoders," 2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI), 2018, pp. 1-8, doi: 10.1109/IWOBI.2018.8464132.

Huang, Po-Sen & Kim, Minje & Hasegawa-Johnson, Mark & Smaragdis, Paris. (2015). Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation. Audio, Speech, and Language Processing, IEEE/ACM Transactions on. 23. 10.1109/TASLP.2015.2468583.

Maas, A.L., Le, Q.V., O'Neil, T.M., Vinyals, O., Nguyen, P., Ng, A.Y. (2012) Recurrent neural networks for noise reduction in robust ASR. Proc. Interspeech 2012, 22-25, doi: 10.21437/Interspeech.2012-6

Pandey, L., Kumar, A., Namboodiri, V. (2018) Monoaural Audio Source Separation Using Variational Autoencoders. Proc. Interspeech 2018, 3489-3493, DOI: 10.21437/Interspeech.2018-1140.

K. Osako, R. Singh and B. Raj, "Complex recurrent neural networks for denoising speech signals," 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2015, pp. 1-5, doi: 10.1109/WASPAA.2015.7336896.

Donahue, C., Li, B., & Prabhavalkar, R. (n.d.). Exploring speech enhancement with generative adversarial networks for robust speech recognition. ICASSP 2018. https://doi.org/10.48550/arXiv.1711.05747

Fu, S., Liao, C., Tsao, Y., & Lin, S. (2019). MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement. ArXiv, abs/1905.04874.

H. Phan et al., "Improving GANs for Speech Enhancement," in IEEE Signal Processing Letters, vol. 27, pp. 1700-1704, 2020, doi: 10.1109/LSP.2020.3025020.

Su, J., Jin, Z., & Finkelstein, A. (2020). HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks. doi:10.48550/ARXIV.2006.05694

Pascual, Santiago & Bonafonte, Antonio & Serrà, Joan. (2017). SEGAN: Speech Enhancement Generative Adversarial Network. 3642-3646. 10.21437/Interspeech.2017-1428.

Wiatrak, M., Albrecht, S. V., & Nystrom, A. (2019). Stabilizing Generative Adversarial Networks: A Survey. doi:10.48550/ARXIV.1910.00927

Goodfellow, Ian & Pouget-Abadie, Jean & Mirza, Mehdi & Xu, Bing & Warde-Farley, David & Ozair, Sherjil & Courville, Aaron & Bengio, Y.. (2014). Generative Adversarial Networks. Advances in Neural Information Processing Systems. 3. 10.1145/3422622.

Unterthiner, T., Nessler, B., Klambauer, G., Heusel, M., Ramsauer, H., & Hochreiter, S. (2018). Coulomb GANs: Provably Optimal Nash Equilibria via Potential Fields. ArXiv, abs/1708.08819.

van den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., … Kavukcuoglu, K. (2016). WaveNet: A Generative Model for Raw Audio. doi:10.48550/ARXIV.1609.03499

J. Shen et al., "Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions," 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 4779-4783, doi: 10.1109/ICASSP.2018.8461368.

Reddy, Chandan & Beyrami, Ebrahim & Pool, Jamie & Cutler, Ross & Srinivasan, Sriram & Gehrke, Johannes. (2019). A Scalable Noisy Speech Dataset and Online Subjective Test Framework. 1816-1820. 10.21437/Interspeech.2019-3087.

Zhao, Junbo, et al. Energy-Based Generative Adversarial Network. arXiv, 6 Mar. 2017. arXiv.org, https://doi.org/10.48550/arXiv.1609.03126.

Berthelot, David, et al. BEGAN: Boundary Equilibrium Generative Adversarial Networks. arXiv, 31 May 2017. arXiv.org, https://doi.org/10.48550/arXiv.1703.10717.

Published

08-31-2022

How to Cite

Satheesh, A., & Muthu-Manivannan, K. (2022). Denoising Speech Signals with Hifi-Coulomb-GANs. Journal of Student Research, 11(3). https://doi.org/10.47611/jsrhs.v11i3.3501

Issue

Section

HS Research Articles