In this paper, we propose a new manifold representation for visual speech recognition. The developed system consists of three main steps:
a. Lip extraction from input video data.
b. Generate the Expectation-Maximization PCA (EMPCA) manifolds for the entire image sequence and perform manifold interpolation and re-sampling.
c. Classify the manifolds using a HMM classifier to identify the words described by the lips motions in the input video sequence.