Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Machine translation in the Covid domain: an English-Irish case study for LoResMT 2021

Lankford, Séamus orcid logoORCID: 0000-0003-1693-9533, Afli, Haithem orcid logoORCID: 0000-0002-7449-4707 and Way, Andy orcid logoORCID: 0000-0001-5736-5930 (2021) Machine translation in the Covid domain: an English-Irish case study for LoResMT 2021. In: 4th Workshop on Technologies for MT of Low Resource Languages (LoResMT2021), 16 August 2021, Orlando,Fl, USA.

Abstract
Translation models for the specific domain of translating Covid data from English to Irish were developed for the LoResMT 2021 shared task. Domain adaptation techniques, using a Covid-adapted generic 55k corpus from the Directorate General of Translation, were applied. Fine-tuning, mixed fine-tuning and combined dataset approaches were compared with models trained on an extended in-domain dataset. As part of this study, an English-Irish dataset of Covid related data, from the Health and Education domains, was developed. The highest-performing model used a Transformer architecture trained with an extended in-domain Covid dataset. In the context of this study, we have demonstrated that extending an 8k in-domain baseline dataset by just 5k lines improved the BLEU score by 27 points.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Workshop
Refereed:Yes
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Institutes and Centres > ADAPT
Published in: Proceedings of the 4th Workshop on Technologies for MT of Low Resource Languages (LoResMT2021). . Association for Computational Linguistics (ACL).
Publisher:Association for Computational Linguistics (ACL)
Official URL:https://aclanthology.org/2021.mtsummit-loresmt.15
Copyright Information:© 2021 Association for Computational Linguistics (ACL)
Funders:Science Foundation Ireland (SFI) Research Centres Programme (Grant 13/RC/2016), European Regional Development Fund, Munster Technological University
ID Code:28342
Deposited On:18 May 2023 14:21 by Seamus Lankford . Last Modified 18 May 2023 14:21
Documents

Full text available as:

[thumbnail of machine_translation_in_the_Covid_domain.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution 4.0
682kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record