Machine translation in the Covid domain: an English-Irish case study for LoResMT 2021

Lankford, Séamus; Afli, Haithem; Way, Andy

Lankford, Séamus ORCID: 0000-0003-1693-9533, Afli, Haithem ORCID: 0000-0002-7449-4707 and Way, Andy ORCID: 0000-0001-5736-5930 (2021) Machine translation in the Covid domain: an English-Irish case study for LoResMT 2021. In: 4th Workshop on Technologies for MT of Low Resource Languages (LoResMT2021), 16 August 2021, Orlando,Fl, USA.

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

Translation models for the specific domain of translating Covid data from English to Irish were developed for the LoResMT 2021 shared task. Domain adaptation techniques, using a Covid-adapted generic 55k corpus from the Directorate General of Translation, were applied. Fine-tuning, mixed fine-tuning and combined dataset approaches were compared with models trained on an extended in-domain dataset. As part of this study, an English-Irish dataset of Covid related data, from the Health and Education domains, was developed. The highest-performing model used a Transformer architecture trained with an extended in-domain Covid dataset. In the context of this study, we have demonstrated that extending an 8k in-domain baseline dataset by just 5k lines improved the BLEU score by 27 points.

Metadata

Item Type:	Conference or Workshop Item (Paper)
Event Type:	Workshop
Refereed:	Yes
Subjects:	Computer Science > Machine translating
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > ADAPT
Published in:	Proceedings of the 4th Workshop on Technologies for MT of Low Resource Languages (LoResMT2021). . Association for Computational Linguistics (ACL).
Publisher:	Association for Computational Linguistics (ACL)
Official URL:	https://aclanthology.org/2021.mtsummit-loresmt.15
Copyright Information:	© 2021 Association for Computational Linguistics (ACL)
Funders:	Science Foundation Ireland (SFI) Research Centres Programme (Grant 13/RC/2016), European Regional Development Fund, Munster Technological University
ID Code:	28342
Deposited On:	18 May 2023 14:21 by Seamus Lankford . Last Modified 18 May 2023 14:21

Documents

Full text available as:

[thumbnail of machine_translation_in_the_Covid_domain.pdf]

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution 4.0
682kB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

Machine translation in the Covid domain: an English-Irish case study for LoResMT 2021

Downloads