Browse DORAS
Browse Theses
Search
Latest Additions
Creative Commons License
Except where otherwise noted, content on this site is licensed for use under a:

Exploring the dimensionality of speech using manifold learning and dimensionality reduction methods

Errity, Andrew (2010) Exploring the dimensionality of speech using manifold learning and dimensionality reduction methods. PhD thesis, Dublin City University.

Full text available as:

[img]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
9Mb

Abstract

Many previous investigations have indicated that speech data has inherent low-dimensional structure and that it may be possible to efficiently represent speech using only a small number of parameters. This view is motivated by the fact that articulatory movement is limited by physiological constraints and thus the speech production apparatus has only limited degrees of freedom. Also, the set of sounds used in human spoken communication is only a small subset of all producible sounds. A number of dimensionality reduction methods capable of discovering such underlying structure have previously been applied to speech. However, if speech lies on a manifold nonlinearly embedded in high-dimensional space, as has been proposed in the past, classic linear dimensionality reduction methods would be unable to discover this embedding. In this dissertation a number of manifold learning, also referred to as nonlinear dimensionality reduction, methods are applied to speech to explore the possibility of underlying nonlinear manifold structure. This dissertation describes a number of existing manifold learning methods and details the application of these methods to high-dimensional feature representations of speech data. Representations derived from the conventional magnitude spectrum and less widely used phase spectrum are investigated. The manifold learning methods used in this study are locally linear embedding, Isomap, and Laplacian eigenmaps. The classic linear method, principal component analysis (PCA), is also applied to facilitate the comparison of linear and nonlinear methods. The resulting low-dimensional representations are analysed through visualisation, phone recognition, and speaker recognition experiments. The recognition experiments are used as a means of evaluating how much meaningful discriminatory information is contained in the low-dimensional representations produced by each method. These experiments also serve to display the potential value of these methods in speech processing applications. The manifold learning methods are shown to be capable of producing meaningful lowdimensional representations of speech data suggesting speech has low-dimensional manifold structure. In general, these methods are found to outperform PCA in low dimensions, indicating that speech may lie on a manifold nonlinearly embedded in high-dimensional space. Phone classification experiments show that Isomap can offer improvements over standard features and PCA-transformed features. Investigation of magnitude and phase spectrum representations found both to have similar low-dimensional structure and confirm that the phase spectrum contains useful information for phone discrimination. Results indicate that combining magnitude and phase spectrum information yields improvements in phone classification tasks. A method to combine magnitude and phase spectrum features for increased phone classification accuracy without large increases in feature dimensionality is also described.

Item Type:Thesis (PhD)
Date of Award:March 2010
Refereed:No
Supervisor(s):McKenna, John
Uncontrolled Keywords:speech processing;
Subjects:Computer Science > Computational linguistics
Engineering > Signal processing
DCU Faculties and Centres:Research Initiatives and Centres > National Centre for Language Technology (NCLT)
Research Initiatives and Centres > Research Institute for Networks and Communications Engineering (RINCE)
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License
Funders:Irish Research Council for Science Engineering and Technology
ID Code:15142
Deposited On:31 Mar 2010 13:47 by John McKenna. Last Modified 31 Mar 2010 13:47

Download statistics

Archive Staff Only: edit this record