Semi-supervised learning with generative models for pathological speech classification
Trinh, NamORCID: 0000-0003-0307-3793
(2021)
Semi-supervised learning with generative models for pathological speech classification.
Master of Science thesis, Dublin City University.
Recent work in pathological speech classification has employed supervised learning algorithms such as neural networks and support vector machines to classify speech as healthy or pathological. A challenge in applying such machine learning techniques to pathological speech classification is the labelled data shortage problem. While labelled data are expensive and scarce, unlabelled data are inexpensive and plentiful. Labelled data acquisition often entails significant human effort and time-consuming experimental design. Further, for medical applications, privacy and ethical issues must be addressed where patient data is collected.
In this thesis, we investigate a semi-supervised learning (SSL) approach that employs a generative model to incorporate both labelled and unlabelled data into the training process. Generative models explored include both a generative adversarial network (GAN) and a variational autoencoder (VAE). To employ a GAN, we modify its traditional discriminator to not only differentiate between real and fake speech samples but to also classify the given sample as healthy or pathological. To employ a VAE, we first pre-train the VAE with unlabelled data and subsequently, incorporate the pre-trained encoder into a classifier to be trained on labelled data.
We test our approach using three commonly used pathological speech datasets: the Spanish Parkinson’s Diseases Dataset (SPDD), the Saarbrucken Voice Database (SVD) and the Arabic Voice Pathology Database (AVPD). We compare the performance of the GAN and VAE-based approaches trained on both labelled and unlabelled data with a traditional supervised approach based on a convolutional neural network (CNN) trained only on labelled data.
We observe that our SSL-based approach leads to an accuracy gain compared to a baseline CNN trained only on labelled pathological speech data. This promising result shows that our approach has the potential to alleviate the labelled data shortage problem in pathological speech classification and other medical applications where labelled data acquisition is challenging.
This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License
Funders:
Science Foundation Ireland under grant No. 17/RC/PHD3488., ADAPT Centre for Digital Content Technology is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.
ID Code:
25955
Deposited On:
27 Oct 2021 16:17 by
Darragh O'brien
. Last Modified 13 Jan 2022 10:36