Haque, Rejwanul
ORCID: 0000-0003-1680-0099, Moslem, Yasmin
ORCID: 0000-0003-4595-6877 and Way, Andy
ORCID: 0000-0001-5736-5930
(2020)
Terminology-aware sentence mining for NMT domain adaptation: ADAPT’s submission to the Adap-MT 2020 English-to-Hindi AI translation shared task.
In: Workshop on Low Resource Domain Adaptation for Indic Machine Translation (Adap-MT 2020), 18-21 Dec 2020, Patna, India (Online).
Abstract
This paper describes the ADAPT Centre’s submission to the Adap-MT 2020 AI Translation Shared Task for English-to-Hindi. The neural machine translation (NMT) systems that we built to translate AI domain texts are state-of- the-art Transformer models. In order to improve the translation quality of our NMT systems, we made use of both in-domain and out-of-domain data for training and employed different fine-tuning techniques for adapting our NMT systems to this task, e.g. mixed fine-tuning and on-the-fly self-training. For this, we mined parallel sentence pairs and monolingual sentences from large out-of-domain data, and the mining process was facilitated through automatic extraction of terminology from the in-domain data. This paper outlines the experiments we carried out for this task and reports the performance of our NMT systems on the evaluation test set.
Metadata
| Item Type: | Conference or Workshop Item (Paper) |
|---|---|
| Event Type: | Conference |
| Refereed: | Yes |
| Additional Information: | Part of ICON 2020: 17th International Conference on Natural Language Processing |
| Subjects: | Computer Science > Artificial intelligence Computer Science > Computational linguistics Computer Science > Machine learning Computer Science > Machine translating |
| DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > ADAPT |
| Published in: | Proceedings of Workshop on Low Resource Domain Adaptation for Indic Machine Translation (Adap-MT 2020). . NLP Association of India (NLPAI). |
| Publisher: | NLP Association of India (NLPAI) |
| Copyright Information: | © 2020 The Authors |
| Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License |
| Funders: | Science Foundation Ireland (SFI) Research Centres Programme (Grant No. 13/RC/2106) and is co-funded under the European Regional Development Fund, Science Foundation Ireland (SFI) under Grant Number 13/RC/2077 and 18/CRT/6224 |
| ID Code: | 25446 |
| Deposited On: | 28 Jan 2021 14:11 by INVALID USER. Last Modified 14 Feb 2022 15:49 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution-Share Alike 4.0 115kB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record