Yu, Dahai, Ghita, Ovidiu, Sutherland, Alistair and Whelan, Paul F. ORCID: 0000-0002-2029-1576 (2012) A new visual speech modelling approach for visual speech recognition. Journal of computing and information technology, 1 (1). pp. 1-11. ISSN 2161-7112
Abstract
In this paper we propose a new learning-based representation that is referred to as Visual Speech Unit (VSU) for visual speech recognition (VSR). The new Visual Speech Unit concept proposes an extension of the standard viseme model that is currently applied for VSR by including in this representation not only the data associated with the visemes, but also the transitory information between consecutive visemes. The developed speech recognition system consists of several computational stages: (a) lips segmentation, (b) construction of the Expectation-Maximization Principal Component Analysis (EM-PCA) manifolds from the input video image, (c) registration between the models of the VSUs and the EM-PCA data constructed from the input image sequence and (d) recognition of the VSUs using a standard Hidden Markov Model (HMM) classification scheme. In this paper we were particularly interested to evaluate the classification accuracy obtained for our new VSU models when compared with that attained for standard (MPEG-4) viseme models. The experimental results indicate that we achieved 90% recognition rate when the system has been applied to the identification of 60 classes of VSUs, while the recognition rate for the standard set of MPEG-4 visemes was only 52%.
Metadata
Item Type: | Article (Published) |
---|---|
Refereed: | Yes |
Uncontrolled Keywords: | computer vision; Visual Speech Unit; VSU; visual speech recognition; VSR |
Subjects: | UNSPECIFIED |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Electronic Engineering |
Publisher: | Academy Publish |
Official URL: | http://www.academypublish.org/paper/a-new-visual-s... |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License |
ID Code: | 18543 |
Deposited On: | 16 Jul 2013 13:03 by Mark Sweeney . Last Modified 11 Jan 2019 13:32 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
1MB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record