A new visual speech modelling approach for visual speech recognition

Yu, Dahai; Ghita, Ovidiu; Sutherland, Alistair; Whelan, Paul F.

Yu, Dahai, Ghita, Ovidiu, Sutherland, Alistair and Whelan, Paul F. ORCID: 0000-0002-2029-1576 (2012) A new visual speech modelling approach for visual speech recognition. Journal of computing and information technology, 1 (1). pp. 1-11. ISSN 2161-7112

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

In this paper we propose a new learning-based representation that is referred to as Visual Speech Unit (VSU) for visual speech recognition (VSR). The new Visual Speech Unit concept proposes an extension of the standard viseme model that is currently applied for VSR by including in this representation not only the data associated with the visemes, but also the transitory information between consecutive visemes. The developed speech recognition system consists of several computational stages: (a) lips segmentation, (b) construction of the Expectation-Maximization Principal Component Analysis (EM-PCA) manifolds from the input video image, (c) registration between the models of the VSUs and the EM-PCA data constructed from the input image sequence and (d) recognition of the VSUs using a standard Hidden Markov Model (HMM) classification scheme. In this paper we were particularly interested to evaluate the classification accuracy obtained for our new VSU models when compared with that attained for standard (MPEG-4) viseme models. The experimental results indicate that we achieved 90% recognition rate when the system has been applied to the identification of 60 classes of VSUs, while the recognition rate for the standard set of MPEG-4 visemes was only 52%.

Metadata

Item Type:	Article (Published)
Refereed:	Yes
Uncontrolled Keywords:	computer vision; Visual Speech Unit; VSU; visual speech recognition; VSR
Subjects:	UNSPECIFIED
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Electronic Engineering
Publisher:	Academy Publish
Official URL:	http://www.academypublish.org/paper/a-new-visual-s...
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
ID Code:	18543
Deposited On:	16 Jul 2013 13:03 by Mark Sweeney . Last Modified 11 Jan 2019 13:32

Documents

Full text available as:

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
1MB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

A new visual speech modelling approach for visual speech recognition

Downloads