Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Sharing high-quality language resources in the legal domain to develop neural machine translation for under-resourced European languages

Bago, Petra, Castilho, Sheila orcid logoORCID: 0000-0002-8416-6555, Celeste, Edoardo orcid logoORCID: 0000-0003-1984-4142, Dunne, Jane, Gaspari, Federico orcid logoORCID: 0000-0003-3808-8418, Gíslason, Niels, Kåsen, Andre, Klubička, Filip, Kristmannsson, Gauti, McHugh, Helen, Moran, Roisin, Ní Loinsigh, Orla, Olsen, Jon, Parra Escartín, Carla orcid logoORCID: 0000-0002-8412-1525, Ramesh, Akshai, Resende, Natália orcid logoORCID: 0000-0002-5248-2457, Sheridan, Páraic and Way, Andy orcid logoORCID: 0000-0001-5736-5930 (2022) Sharing high-quality language resources in the legal domain to develop neural machine translation for under-resourced European languages. Revista de Llengua i Dret (Journal of Language and Law), 78 . pp. 9-34. ISSN 2013-1453

Abstract
This article reports some of the main achievements of the European Union-funded PRINCIPLE project in collecting high-quality language resources (LRs) in the legal domain for four under-resourced European languages: Croatian, Irish, Norwegian, and Icelandic. After illustrating the significance of this work for developing translation technologies in the context of the European Union and the European Economic Area, the article outlines the main steps of data collection, curation, and sharing of the LRs gathered with the support of public and private data contributors. This is followed by a description of the development pipeline and key features of the state-of-the-art, bespoke neural machine translation (MT) engines for the legal domain that were built using this data. The MT systems were evaluated with a combination of automatic and human methods to validate the quality of the LRs collected in the project, and the high-quality LRs were subsequently shared with the wider community via the ELRC-SHARE repository. The main challenges encountered in this work are discussed, emphasising the importance and the key benefits of sharing high-quality digital LRs. Petra;Sheila;Edoardo;Jane;Federico;Níels ;Andre;Filip;Gauti;Helen;Róisín;Órla ;Jon;Carla ;Akshai;Natalia;Páraic;Andy Way
Metadata
Item Type:Article (Published)
Refereed:Yes
Subjects:Computer Science > Machine translating
Social Sciences > Law
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
DCU Faculties and Schools > Faculty of Science and Health > School of Nursing, Psychotherapy & Community Health
DCU Faculties and Schools > Faculty of Humanities and Social Science > School of Law and Government
Research Institutes and Centres > ADAPT
Publisher:Escola d'Administracio Publica de Cataluny
Official URL:https://dx.doi.org/10.2436-rld.i78.2022.3741
Copyright Information:© 2022 Escola d'Administracio Publica de Cataluny
ID Code:27967
Deposited On:10 Mar 2023 12:09 by Edoardo Celeste . Last Modified 12 Jan 2024 12:08
Documents

Full text available as:

[thumbnail of Bago et al._SHARING HIGH-QUALITY LANGUAGE RESOURCES IN THE LEGAL DOMAIN TO DEVELOP NEURAL MACHINE TRANSLATION FOR UNDER-RESOURCED EUROPEAN LANGUAGES.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution-Noncommercial-No Derivative Works 4.0
681kB
Metrics

Altmetric Badge

Dimensions Badge

Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record