Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

DELA Corpus - A Document-Level Corpus Annotated with Context-Related Issues

Castilho, Sheila orcid logoORCID: 0000-0002-8416-6555, Cavalheiro Camargo, João Lucas orcid logoORCID: 0000-0003-3746-1225, Menezes, Miguel and Way, Andy orcid logoORCID: 0000-0001-5736-5930 (2021) DELA Corpus - A Document-Level Corpus Annotated with Context-Related Issues. In: Sixth Conference on Machine Translation (WMT21), 10-11 Nov 2021, Punta Cana, Dominican Republic (Online). ISBN 978-1-954085-94-7

Abstract
Recently, the Machine Translation (MT) community has become more interested in document-level evaluation especially in light of reactions to claims of "human parity", since examining the quality at the level of the document rather than at the sentence level allows for the assessment of suprasentential context, providing a more reliable evaluation. This paper presents a document-level corpus annotated in English with context-aware issues that arise when translating from English into Brazilian Portuguese, namely ellipsis, gender, lexical ambiguity, number, reference, and terminology, with six different domains. The corpus can be used as a challenge test set for evaluation and as a training/testing corpus for MT as well as for deep linguistic analysis of context issues. To the best of our knowledge, this is the first corpus of its kind.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Additional Information:pp 556-577
Uncontrolled Keywords:machine translation evaluation; document-level MT; corpus, annotation
Subjects:Computer Science > Computational linguistics
Computer Science > Machine translating
Humanities > Language
Humanities > Translating and interpreting
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Institutes and Centres > ADAPT
Published in: Proceedings of the Sixth Conference on Machine Translation. . Association for Computational Linguistics (ACL). ISBN 978-1-954085-94-7
Publisher:Association for Computational Linguistics (ACL)
Official URL:https://aclanthology.org/2021.wmt-1.63
Copyright Information:© 2021 The Authors
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Irish Research Council (GOIPD/2020/69), Science Foundation Ireland through the SFI Research Centres Programme (Grant 13/RC/2106_P2)
ID Code:26256
Deposited On:14 Sep 2021 12:44 by Dr Sheila Castilho M de Sousa . Last Modified 27 Apr 2022 11:01
Documents

Full text available as:

[thumbnail of EMNLP_WMT2021_DELA Corpus.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
367kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record