Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Combining SMT and NMT back-translated data for efficient NMT

Poncelas, Alberto orcid logoORCID: 0000-0002-5089-1687, Popović, Maja orcid logoORCID: 0000-0001-8234-8745, Shterionov, Dimitar orcid logoORCID: 0000-0001-6300-797X, Maillette de Buy Wenniger, Gideon and Way, Andy orcid logoORCID: 0000-0001-5736-5930 (2019) Combining SMT and NMT back-translated data for efficient NMT. In: Recent Advances in Natural Language Processing (RANLP 2019), 2-4 Sept 2019, Varna, Bulgaria.

Abstract
Neural Machine Translation (NMT) models achieve their best performance when large sets of parallel data are used for training. Consequently, techniques for augmenting the training set have become popular recently. One of these methods is back-translation (Sennrich et al., 2016), which consists on generating synthetic sentences by translating a set of monolingual, target-language sentences using a Machine Translation (MT) model. Generally, NMT models are used for back-translation. In this work, we analyze the performance of models when the training data is extended with synthetic data using different MT approaches. In particular we investigate back-translated data generated not only by NMT but also by Statistical Machine Translation (SMT) models and combinations of both. The results reveal that the models achieve the best performances when the training set is augmented with back-translated data created by merging different MT approaches.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Subjects:Computer Science > Computational linguistics
Computer Science > Machine translating
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Institutes and Centres > ADAPT
Published in: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019). . INCOMA Ltd.
Publisher:INCOMA Ltd
Official URL:http://dx.doi.org/10.26615/978-954-452-056-4_107
Copyright Information:© 2019 the Authors
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:SFI Research Centres Programme (Grant 13/RC/2106), European Regional Development Fund, European Regional Development Fund, European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 713567.
ID Code:24272
Deposited On:11 Mar 2020 10:14 by Alberto Poncelas . Last Modified 22 Jan 2021 14:21
Documents

Full text available as:

[thumbnail of RANLP2019_Backtranslation_paper.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
413kB
Metrics

Altmetric Badge

Dimensions Badge

Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record