Haque, Rejwanul ORCID: 0000-0003-1680-0099, Moslem, Yasmin ORCID: 0000-0003-4595-6877 and Way, Andy ORCID: 0000-0001-5736-5930 (2020) Terminology-aware sentence mining for NMT domain adaptation: ADAPT’s submission to the Adap-MT 2020 English-to-Hindi AI translation shared task. In: Workshop on Low Resource Domain Adaptation for Indic Machine Translation (Adap-MT 2020), 18-21 Dec 2020, Patna, India (Online).
Abstract
This paper describes the ADAPT Centre’s submission to the Adap-MT 2020 AI Translation Shared Task for English-to-Hindi. The neural machine translation (NMT) systems that we built to translate AI domain texts are state-of- the-art Transformer models. In order to improve the translation quality of our NMT systems, we made use of both in-domain and out-of-domain data for training and employed different fine-tuning techniques for adapting our NMT systems to this task, e.g. mixed fine-tuning and on-the-fly self-training. For this, we mined parallel sentence pairs and monolingual sentences from large out-of-domain data, and the mining process was facilitated through automatic extraction of terminology from the in-domain data. This paper outlines the experiments we carried out for this task and reports the performance of our NMT systems on the evaluation test set.
Metadata
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Event Type: | Conference |
Refereed: | Yes |
Additional Information: | Part of ICON 2020: 17th International Conference on Natural Language Processing |
Subjects: | Computer Science > Artificial intelligence Computer Science > Computational linguistics Computer Science > Machine learning Computer Science > Machine translating |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > ADAPT |
Published in: | Proceedings of Workshop on Low Resource Domain Adaptation for Indic Machine Translation (Adap-MT 2020). . NLP Association of India (NLPAI). |
Publisher: | NLP Association of India (NLPAI) |
Copyright Information: | © 2020 The Authors |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License |
Funders: | Science Foundation Ireland (SFI) Research Centres Programme (Grant No. 13/RC/2106) and is co-funded under the European Regional Development Fund, Science Foundation Ireland (SFI) under Grant Number 13/RC/2077 and 18/CRT/6224 |
ID Code: | 25446 |
Deposited On: | 28 Jan 2021 14:11 by Thomas Murtagh . Last Modified 14 Feb 2022 15:49 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution-Share Alike 4.0 115kB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record