Poncelas, Alberto ORCID: 0000-0002-5089-1687, Popović, Maja ORCID: 0000-0001-8234-8745, Shterionov, Dimitar ORCID: 0000-0001-6300-797X, Maillette de Buy Wenniger, Gideon and Way, Andy ORCID: 0000-0001-5736-5930 (2019) Combining SMT and NMT back-translated data for efficient NMT. In: Recent Advances in Natural Language Processing (RANLP 2019), 2-4 Sept 2019, Varna, Bulgaria.
Abstract
Neural Machine Translation (NMT) models achieve their best performance when large sets of parallel data are used for training. Consequently, techniques for augmenting the training set have become popular recently. One of these methods is back-translation (Sennrich et al., 2016), which consists on generating synthetic sentences by translating a set of monolingual, target-language sentences using a Machine Translation (MT) model.
Generally, NMT models are used for back-translation. In this work, we analyze the performance of models when the training data is extended with synthetic data using different MT approaches. In particular we investigate back-translated data generated not only by NMT but also by Statistical Machine Translation (SMT) models and combinations of both. The results reveal that the models achieve the best performances when the training set is augmented with back-translated data created by merging different MT approaches.
Metadata
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Event Type: | Conference |
Refereed: | Yes |
Subjects: | Computer Science > Computational linguistics Computer Science > Machine translating |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > ADAPT |
Published in: | Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019). . INCOMA Ltd. |
Publisher: | INCOMA Ltd |
Official URL: | http://dx.doi.org/10.26615/978-954-452-056-4_107 |
Copyright Information: | © 2019 the Authors |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License |
Funders: | SFI Research Centres Programme (Grant 13/RC/2106), European Regional Development Fund, European Regional Development Fund, European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 713567. |
ID Code: | 24272 |
Deposited On: | 11 Mar 2020 10:14 by Alberto Poncelas . Last Modified 22 Jan 2021 14:21 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
413kB |
Metrics
Altmetric Badge
Dimensions Badge
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record