Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Enhancing gappy speech audio signals with generative adversarial networks

Strods, Deniss and Smeaton, Alan F. orcid logoORCID: 0000-0003-1028-8389 (2023) Enhancing gappy speech audio signals with generative adversarial networks. In: 34th Irish Signals and Systems Conference (ISSC) 2023, 13-14 June 2023, Dublin, Ireland. ISBN 979-8-3503-4057-0

Abstract
Gaps, dropouts and short clips of corrupted audio are a common problem and particularly annoying when they occur in speech. This paper uses machine learning to regenerate gaps of up to 320ms in an audio speech signal. Audio regeneration is translated into image regeneration by transforming audio into a Mel-spectrogram and using image in-painting to regenerate the gaps. The full Mel-spectrogram is then transferred back to audio using the Parallel-WaveGAN vocoder and integrated into the audio stream. Using a sample of 1300 spoken audio clips of between 1 and 10 seconds taken from the publicly-available LJSpeech dataset our results show regeneration of audio gaps in close to real time using GANs with a GPU equipped system. As expected, the smaller the gap in the audio, the better the quality of the filled gaps. On a gap of 240ms the average mean opinion score (MOS) for the best performing models was 3.737, on a scale of 1 (worst) to 5 (best) which is sufficient for a human to perceive as close to uninterrupted human speech.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:Gappy audio, Mel-spectrograms, image in- painting, GANs
Subjects:Computer Science > Artificial intelligence
Computer Science > Machine learning
Engineering > Signal processing
Engineering > Telecommunication
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Institutes and Centres > INSIGHT Centre for Data Analytics
Published in: Proceedings of the 34th Irish Signals and Systems Conference (ISSC) 2023. . IEEE. ISBN 979-8-3503-4057-0
Publisher:IEEE
Official URL:https://doi.org/10.1109/ISSC59246.2023.10161997
Copyright Information:© 2023 IEEE
Funders:Science Foundation Ireland (SFI) Grant Number SFI/12/RC/ 2289 P2.
ID Code:28320
Deposited On:19 Jun 2023 13:09 by Alan Smeaton . Last Modified 04 Mar 2024 16:00
Documents

Full text available as:

[thumbnail of 2023116093 (1).pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution-Noncommercial-Share Alike 4.0
4MB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record