Neural machine translation between similar south-Slavic languages

Popović, Maja ORCID: 0000-0001-8234-8745 and Poncelas, Alberto ORCID: 0000-0002-5089-1687 (2020) Neural machine translation between similar south-Slavic languages. In: 2020 Fifth Conference on Machine Translation (WMT20), 19-20 Nov 2020, Dominican Republic (Online).

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

This paper describes the ADAPT-DCU machine translation systems built for the WMT 2020 shared task on Similar Language Translation. We explored several set-ups for NMT for Croatian–Slovenian and Serbian–Slovenian language pairs in both translation directions. Our experiments focus on different amounts and types of training data: we first apply basic filtering on the OpenSubtitles training corpora, then we perform additional cleaning of remaining misaligned segments based on character n-gram matching. Finally, we make use of additional monolingual data by creating synthetic parallel data through back-translation. Automatic evaluation shows that multilingual systems with joint Serbian and Croatian data are better than bilingual, as well as that character-based cleaning leads to improved scores while using less data. The results also confirm once more that adding back-translated data further improves the performance, especially when the synthetic data is similar to the desired domain of the development and test set. This, however, might come at a price of prolonged training time, especially for multitarget systems.

Metadata

Item Type:	Conference or Workshop Item (Paper)
Event Type:	Conference
Refereed:	Yes
Subjects:	Computer Science > Computational linguistics Computer Science > Machine translating
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Initiatives and Centres > ADAPT
Published in:	Fifth Conference on Machine Translation (at EMNLP-2020). . Association for Computational Linguistics (ACL).
Publisher:	Association for Computational Linguistics (ACL)
Official URL:	https://www.aclweb.org/anthology/2020.wmt-1.51
Copyright Information:	© 2020 The Authors. CC-BY- 4.0
Funders:	Science Foundation Ireland through the SFI Research Cen-tres Programme 13/RC/2106, European Regional Development Fund (ERDF)
ID Code:	25080
Deposited On:	18 Nov 2020 17:01 by Alberto Poncelas . Last Modified 25 Jun 2021 13:02

Documents

Full text available as:

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
197kB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

Altmetric