Newsome, Keith J (1996) Investigation into zero-crossing techniques as a viable means of speech recognition. Master of Engineering thesis, Dublin City University.
Abstract
The idea behind this research is to demonstrate how a fundamental characteristic of speech (zero-crossing information) may be exploited in the development of a low cost, highly effective speech recognition system. The system is to be used to recognise a small vocabulary of isolated speech. Although intended to be speaker dependent, the system is also tested for speaker independence.
A brief description of how speech is produced and recognised by a human subject is first presented. Following this, some features of both voiced and unvoiced speech signals and their associated spectra are discussed in relation to zero-crossing information. Phonemes and their segmentation (using zero-crossing data or otherwise) are also examined. A brief discussion of stationarity and its effects on zero-crossings is then given. The choice of pre-processing filters is also mentioned.
Two methods of speech recognition implementing zero-crossing information are then discussed.
The first technique studied analyses the ‘spacing’ between zero-crossings, producing a signal whose amplitude is proportional to the distance between successive crossings. The possibility of this system, (termed Sinusoidal Instantaneous Frequency Extractor (SIFE) [14]), producing effective recognition parameters is examined.
A second analysis technique, called Higher Order Crossing Analysis (HOC) [25], is then introduced. This method extracts higher order zero-crossing information from the signal using various filtering techniques and uses this data to recognise the speech signal.
Modified versions of both methods were developed, tested and found to be more effective and adaptable than their predecessors.
A new parameter (Columnised Higher Order Crossing (CHOC)) was developed and found to be more effective than HOC. Dynamic Time Warping was then implemented to pattern match CHOC templates with CHOC test signals, enabling a percentage success rate for the CHOC system to be achieved (-90% ).
Finally, a comparison of the two systems is then made and a discussion about their effectiveness is given.
Metadata
Item Type: | Thesis (Master of Engineering) |
---|---|
Date of Award: | 1996 |
Refereed: | No |
Additional Information: | In conjunction with Dublin Institute of Technology |
Supervisor(s): | Scaife, Ronan and Coyle, Eugene |
Uncontrolled Keywords: | Speech processing systems; Computational linguistics; Speech recognition systems; Zero-crossing information |
Subjects: | Engineering > Electronic engineering |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Electronic Engineering |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License |
ID Code: | 19139 |
Deposited On: | 04 Sep 2013 11:03 by Celine Campbell . Last Modified 19 Jul 2018 15:01 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
71MB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record