Combining SMT and NMT back-translated data for efficient NMT

Poncelas, Alberto ORCID: 0000-0002-5089-1687, Popović, Maja ORCID: 0000-0001-8234-8745, Shterionov, Dimitar ORCID: 0000-0001-6300-797X, Maillette de Buy Wenniger, Gideon and Way, Andy ORCID: 0000-0001-5736-5930 (2019) Combining SMT and NMT back-translated data for efficient NMT. In: Recent Advances in Natural Language Processing (RANLP 2019), 2-4 Sept 2019, Varna, Bulgaria.

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

Neural Machine Translation (NMT) models achieve their best performance when large sets of parallel data are used for training. Consequently, techniques for augmenting the training set have become popular recently. One of these methods is back-translation (Sennrich et al., 2016), which consists on generating synthetic sentences by translating a set of monolingual, target-language sentences using a Machine Translation (MT) model. Generally, NMT models are used for back-translation. In this work, we analyze the performance of models when the training data is extended with synthetic data using different MT approaches. In particular we investigate back-translated data generated not only by NMT but also by Statistical Machine Translation (SMT) models and combinations of both. The results reveal that the models achieve the best performances when the training set is augmented with back-translated data created by merging different MT approaches.

Metadata

Item Type:	Conference or Workshop Item (Paper)
Event Type:	Conference
Refereed:	Yes
Subjects:	Computer Science > Computational linguistics Computer Science > Machine translating
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Initiatives and Centres > ADAPT
Published in:	Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019). . INCOMA Ltd.
Publisher:	INCOMA Ltd
Official URL:	http://dx.doi.org/10.26615/978-954-452-056-4_107
Copyright Information:	© 2019 the Authors
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:	SFI Research Centres Programme (Grant 13/RC/2106), European Regional Development Fund, European Regional Development Fund, European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 713567.
ID Code:	24272
Deposited On:	11 Mar 2020 10:14 by Alberto Poncelas . Last Modified 22 Jan 2021 14:21

Documents

Full text available as:

[thumbnail of RANLP2019_Backtranslation_paper.pdf]

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
413kB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

Altmetric