Moriya, Yasufumi and Jones, Gareth J.F. ORCID: 0000-0003-2923-8365 (2019) LSTM language model adaptation with images and titles for multimedia automatic speech recognition. In: IEEE SLT 2018 - Workshop on Spoken Language Technology, 18-21 Dec 2018, Athens, Greece. ISBN 978-1-5386-4334-1
Abstract
Transcription of multimedia data sources is often a challenging automatic speech recognition (ASR) task. The incorporation of visual features as additional contextual information
as a means to improve ASR for this data has recently drawn
attention from researchers. Our investigation extends existing ASR methods by using images and video titles to adapt a
recurrent neural network (RNN) language model with a longshort term memory (LSTM) network. Our language model
is tested on transcription of an existing corpus of instruction
videos and on a new corpus consisting of lecture videos. Consistent reduction in perplexity by 5-10 is observed on both
datasets. When the non-adapted model is combined with the
image adaptation and video title adaptation models for n-best
ASR hypotheses re-ranking, additionally the word error rate
(WER) is decreased by around 0.5% on both datasets. By
analysing the output word probabilities of the model, it is
found that both image adaptation and video title adaptation
give the model more confidence in the choice of contextually
correct informative words
Metadata
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Event Type: | Workshop |
Refereed: | Yes |
Uncontrolled Keywords: | ASR; LSTM; multimodal language; model adaptation |
Subjects: | UNSPECIFIED |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > ADAPT |
Published in: | Proceedings of the Spoken Language Technology Workshop (SLT) 2018 IEEE. . IEEE. ISBN 978-1-5386-4334-1 |
Publisher: | IEEE |
Official URL: | http://dx.doi.org/10.1109/SLT.2018.8639551 |
Copyright Information: | © 2018 IEEE |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License |
Funders: | Science Foundation Ireland as part of the ADAPT Centre (Grant 13/RC/2106) (www. adaptcentre.ie) at Dublin City University |
ID Code: | 23389 |
Deposited On: | 30 May 2019 15:31 by Thomas Murtagh . Last Modified 31 Jul 2019 08:44 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
471kB |
Metrics
Altmetric Badge
Dimensions Badge
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record