Incorporating visual information into neural machine translation

Calixto, Iacer (2017) Incorporating visual information into neural machine translation. PhD thesis, Dublin City University.

Abstract
Metadata
Downloads
Documents

[+]

In this work, we study diﬀerent ways to enrich Machine Translation (MT) models using information obtained from images. Speciﬁcally, we propose diﬀerent models to incorporate images into MT by transferring learning from pre-trained convolutional neural networks (CNN) trained for classifying images. We use these pre-trained CNNs for image feature extraction, and use two diﬀerent types of visual features: global visual features, that encode an entire image into one single real-valued feature vector; and local visual features, that encode diﬀerent areas of an image into separate real-valued vectors, therefore also encoding spatial information. We ﬁrst study how to train embeddings that are both multilingual and multi-modal, and use global visual features and multilingual sentences for training. Second, we propose diﬀerent models to incorporate global visual features into state-of-the-art Neural Machine Translation (NMT): (i) as words in the source sentence, (ii) to initialise the encoder hidden state, and (iii) as additional data to initialise the decoder hidden state. Finally, we put forward one model to incorporate local visual features into NMT: (i) a NMT model with an independent visual attention mechanism integrated into the same decoder Recurrent Neural Network (RNN) as the source-language attention mechanism. We evaluate our models on the Multi30k, a publicly available, general domain data set, and also on a proprietary data set of product listings and images built by eBay Inc., which was made available for the purpose of this research. We report state-of-the-art results on the publicly available Multi30k data set. Our best models also signiﬁcantly improve on comparable phrase-based Statistical MT (PBSMT) models trained on the same data set, according to widely adopted MT metrics.

Item Type:	Thesis (PhD)
Date of Award:	November 2017
Refereed:	No
Supervisor(s):	Liu, Qun and Campbell, Nick
Subjects:	Computer Science > Machine translating Computer Science > Computational linguistics Computer Science > Machine learning Computer Science > Artificial intelligence
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > ADAPT
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License
ID Code:	21942
Deposited On:	10 Nov 2017 12:38 by Qun Liu . Last Modified 25 Oct 2018 09:21

Full text available as:

Preview

PDF (PhD thesis ) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
4MB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

Incorporating visual information into neural machine translation

Downloads