Terminology-aware sentence mining for NMT domain adaptation: ADAPT’s submission to the Adap-MT 2020 English-to-Hindi AI translation shared task

Haque, Rejwanul; Moslem, Yasmin; Way, Andy

Home
Browse By

Author

DCU Faculties and Centres

Theses

Subject

Year

Publication Type

Year of Award

Supervisors
About / FAQ
Statistics
Login (DCU Staff Only)

Terminology-aware sentence mining for NMT domain adaptation: ADAPT’s submission to the Adap-MT 2020 English-to-Hindi AI translation shared task

Haque, Rejwanul ORCID: 0000-0003-1680-0099, Moslem, Yasmin ORCID: 0000-0003-4595-6877 and Way, Andy ORCID: 0000-0001-5736-5930 (2020) Terminology-aware sentence mining for NMT domain adaptation: ADAPT’s submission to the Adap-MT 2020 English-to-Hindi AI translation shared task. In: Workshop on Low Resource Domain Adaptation for Indic Machine Translation (Adap-MT 2020), 18-21 Dec 2020, Patna, India (Online).

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

This paper describes the ADAPT Centre’s submission to the Adap-MT 2020 AI Translation Shared Task for English-to-Hindi. The neural machine translation (NMT) systems that we built to translate AI domain texts are state-of- the-art Transformer models. In order to improve the translation quality of our NMT systems, we made use of both in-domain and out-of-domain data for training and employed different fine-tuning techniques for adapting our NMT systems to this task, e.g. mixed fine-tuning and on-the-fly self-training. For this, we mined parallel sentence pairs and monolingual sentences from large out-of-domain data, and the mining process was facilitated through automatic extraction of terminology from the in-domain data. This paper outlines the experiments we carried out for this task and reports the performance of our NMT systems on the evaluation test set.

Metadata

Item Type:	Conference or Workshop Item (Paper)
Event Type:	Conference
Refereed:	Yes
Additional Information:	Part of ICON 2020: 17th International Conference on Natural Language Processing
Subjects:	Computer Science > Artificial intelligence Computer Science > Computational linguistics Computer Science > Machine learning Computer Science > Machine translating
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > ADAPT
Published in:	Proceedings of Workshop on Low Resource Domain Adaptation for Indic Machine Translation (Adap-MT 2020). . NLP Association of India (NLPAI).
Publisher:	NLP Association of India (NLPAI)
Copyright Information:	© 2020 The Authors
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:	Science Foundation Ireland (SFI) Research Centres Programme (Grant No. 13/RC/2106) and is co-funded under the European Regional Development Fund, Science Foundation Ireland (SFI) under Grant Number 13/RC/2077 and 18/CRT/6224
ID Code:	25446
Deposited On:	28 Jan 2021 14:11 by INVALID USER. Last Modified 14 Feb 2022 15:49

Documents

Full text available as:

[thumbnail of Terminology-Aware_Sentence_Mining_for_NMT_Domain_Adaptation.pdf]

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution-Share Alike 4.0
115kB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

Terminology-aware sentence mining for NMT domain adaptation: ADAPT’s submission to the Adap-MT 2020 English-to-Hindi AI translation shared task

Downloads