Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Investigating backtranslation in neural machine translation

Poncelas, Alberto orcid logoORCID: 0000-0002-5089-1687, Shterionov, Dimitar orcid logoORCID: 0000-0001-6300-797X, Way, Andy orcid logoORCID: 0000-0001-5736-5930, Maillette de Buy Wenniger, Gideon and Passban, Peyman (2018) Investigating backtranslation in neural machine translation. In: 21st Annual Conference of The European Association for Machine Translation, 28-30 May 2018, Alicante, Spain.

Abstract
A prerequisite for training corpus-based machine translation (MT) systems – either Statistical MT (SMT) or Neural MT (NMT) – is the availability of high-quality parallel data. This is arguably more important today than ever before, as NMT has been shown in many studies to outperform SMT, but mostly when large parallel corpora are available; in cases where data is limited, SMT can still outperform NMT. Recently researchers have shown that back-translating monolingual data can be used to create synthetic parallel corpora, which in turn can be used in combination with authentic parallel data to train a highquality NMT system. Given that large collections of new parallel text become available only quite rarely, backtranslation has become the norm when building state-of-the-art NMT systems, especially in resource-poor scenarios. However, we assert that there are many unknown factors regarding the actual effects of back-translated data on the translation capabilities of an NMT model. Accordingly, in this work we investigate how using back-translated data as a training corpus – both as a separate standalone dataset as well as combined with human-generated parallel data – affects the performance of an NMT model. We use incrementally larger amounts of back-translated data to train a range of NMT systems for German-to-English, and analyse the resulting translation performance.
Metadata
Item Type:Conference or Workshop Item (Lecture)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:Machine Translation; Statistical Machine Translation; Neural Machine Translation
Subjects:UNSPECIFIED
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Institutes and Centres > ADAPT
Published in: Proceedings of the 21st Annual Conference of the European Association for Machine Translation. .
Official URL:http://dx.doi.org/10.18653/v1/W18-64015
Copyright Information:©2018 the Authors
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
ID Code:22881
Deposited On:19 Dec 2018 12:43 by Gideon Maillette De buy . Last Modified 22 Jan 2021 14:15
Documents

Full text available as:

[thumbnail of InvestigatingBacktranslationInNMT.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
700kB
[thumbnail of Plain Text Bibliography] Other (Plain Text Bibliography)
9kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record