We report experiments with multi-modal
neural machine translation models that incorporate global visual features in different parts of the encoder and decoder, and
use the VGG19 network to extract features for all images. In our experiments,
we explore both different strategies to include global image features and also how
ensembling different models at inference
time impact translations. Our submissions
ranked 3rd best for translating from English into French, always improving considerably over an neural machine translation baseline across all language pair evaluated, e.g. an increase of 7.0–9.2 METEOR points.
This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:
Science Foundation Ireland in the ADAPT Centre for Digital Content Technology (www.adaptcentre. ie) at Dublin City University funded under the SFI Research Centres Programme (Grant 13/RC/2106), European Regional Development Fund and the European Union Horizon 2020 research and innovation programme under grant agreement 645452 (QT21).
ID Code:
23333
Deposited On:
21 May 2019 15:44 by
Thomas Murtagh
. Last Modified 24 Jul 2019 14:27