Skip to main content
DORAS
DCU Online Research Access Service
Login (DCU Staff Only)
Machine translation in the Covid domain: an English-Irish case study for LoResMT 2021

Lankford, Séamus orcid logoORCID: 0000-0003-1693-9533, Afli, Haithem orcid logoORCID: 0000-0002-7449-4707 and Way, Andy orcid logoORCID: 0000-0001-5736-5930 (2021) Machine translation in the Covid domain: an English-Irish case study for LoResMT 2021. In: 4th Workshop on Technologies for MT of Low Resource Languages (LoResMT2021), 16 August 2021, Orlando,Fl, USA.

  • Abstract
  • Metadata
  • Documents
[+][-]
Abstract
Translation models for the specific domain of translating Covid data from English to Irish were developed for the LoResMT 2021 shared task. Domain adaptation techniques, using a Covid-adapted generic 55k corpus from the Directorate General of Translation, were applied. Fine-tuning, mixed fine-tuning and combined dataset approaches were compared with models trained on an extended in-domain dataset. As part of this study, an English-Irish dataset of Covid related data, from the Health and Education domains, was developed. The highest-performing model used a Transformer architecture trained with an extended in-domain Covid dataset. In the context of this study, we have demonstrated that extending an 8k in-domain baseline dataset by just 5k lines improved the BLEU score by 27 points.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Workshop
Refereed:Yes
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Initiatives and Centres > ADAPT
Published in: Proceedings of the 4th Workshop on Technologies for MT of Low Resource Languages (LoResMT2021). . Association for Computational Linguistics (ACL).
Publisher:Association for Computational Linguistics (ACL)
Official URL:https://aclanthology.org/2021.mtsummit-loresmt.15
Copyright Information:© 2021 Association for Computational Linguistics (ACL)
Funders:Science Foundation Ireland (SFI) Research Centres Programme (Grant 13/RC/2016), European Regional Development Fund, Munster Technological University
ID Code:28342
Deposited On:18 May 2023 14:21 by Seamus Lankford. Last Modified 18 May 2023 14:21
Documents

Full text available as:

[thumbnail of machine_translation_in_the_Covid_domain.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution 4.0
682kB

Archive Staff Only: edit this record

  • Student Email
  • Staff Email
  • Student Apps
  • Staff Apps
  • Loop
  • Disclaimer
  • Privacy
  • Contact Us