LSTM language model adaptation with images and titles for multimedia automatic speech recognition

Moriya, Yasufumi and Jones, Gareth J.F. ORCID: 0000-0003-2923-8365 (2019) LSTM language model adaptation with images and titles for multimedia automatic speech recognition. In: IEEE SLT 2018 - Workshop on Spoken Language Technology, 18-21 Dec 2018, Athens, Greece. ISBN 978-1-5386-4334-1

[+]

Transcription of multimedia data sources is often a challenging automatic speech recognition (ASR) task. The incorporation of visual features as additional contextual information as a means to improve ASR for this data has recently drawn attention from researchers. Our investigation extends existing ASR methods by using images and video titles to adapt a recurrent neural network (RNN) language model with a longshort term memory (LSTM) network. Our language model is tested on transcription of an existing corpus of instruction videos and on a new corpus consisting of lecture videos. Consistent reduction in perplexity by 5-10 is observed on both datasets. When the non-adapted model is combined with the image adaptation and video title adaptation models for n-best ASR hypotheses re-ranking, additionally the word error rate (WER) is decreased by around 0.5% on both datasets. By analysing the output word probabilities of the model, it is found that both image adaptation and video title adaptation give the model more confidence in the choice of contextually correct informative words

Item Type:	Conference or Workshop Item (Paper)
Event Type:	Workshop
Refereed:	Yes
Uncontrolled Keywords:	ASR; LSTM; multimodal language; model adaptation
Subjects:	UNSPECIFIED
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > ADAPT
Published in:	Proceedings of the Spoken Language Technology Workshop (SLT) 2018 IEEE. . IEEE. ISBN 978-1-5386-4334-1
Publisher:	IEEE
Official URL:	http://dx.doi.org/10.1109/SLT.2018.8639551
Copyright Information:	© 2018 IEEE
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:	Science Foundation Ireland as part of the ADAPT Centre (Grant 13/RC/2106) (www. adaptcentre.ie) at Dublin City University
ID Code:	23389
Deposited On:	30 May 2019 15:31 by Thomas Murtagh . Last Modified 31 Jul 2019 08:44

Full text available as:

[thumbnail of LSTM_LANGUAGE_MODEL_ADAPTATION_WITH_IMAGES_AND_TITLES_FOR_MULTIMEDIA_AUTOMATIC_SPEECH_RECOGNITION[1].pdf]

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
471kB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

LSTM language model adaptation with images and titles for multimedia automatic speech recognition

Altmetric Badge

Dimensions Badge

Downloads