DiHuTra: a parallel corpus to analyse differences between human translations

Lapshinova-Koltunski, Ekaterina; Popović, Maja; Koponen, Maarit

Lapshinova-Koltunski, Ekaterina ORCID: 0000-0002-5618-8087, Popović, Maja ORCID: 0000-0001-8234-8745 and Koponen, Maarit ORCID: 0000-0002-6123-5386 (2022) DiHuTra: a parallel corpus to analyse differences between human translations. In: 13th Language Resources and Evaluation Conference, 20-25 June 2022, Marseille, France.

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

This paper describes a new corpus of human translations which contains both professional and students translations. The data consists of English sources — texts from news and reviews — and their translations into Russian and Croatian, as well as of the subcorpus containing translations of the review texts into Finnish. All target languages represent mid-resourced and less or mid-investigated ones. The corpus will be valuable for studying variation in translation as it allows a direct comparison between human translations of the same source texts. The corpus will also be a valuable resource for evaluating machine translation systems. We believe that this resource will facilitate understanding and improvement of the quality issues in both human and machine translation. In the paper, we describe how the data was collected, provide information on translator groups and summarise the differences between the human translations at hand based on our preliminary results with shallow features.

Metadata

Item Type:	Conference or Workshop Item (Paper)
Event Type:	Conference
Refereed:	Yes
Uncontrolled Keywords:	translation; human translation; parallel corpus; multilingual corpus; multilinguality; Russian; Croatian; Finnish; translation variation; news translation; review translation
Subjects:	Computer Science > Machine translating Humanities > Translating and interpreting
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > ADAPT
Published in:	Proceedings of the Thirteenth Language Resources and Evaluation Conference. . European Language Resources Association.
Publisher:	European Language Resources Association
Official URL:	https://aclanthology.org/2022.lrec-1.186
Copyright Information:	© European Language Resources Association (ELRA)
ID Code:	28366
Deposited On:	25 May 2023 11:51 by Maja Popovic . Last Modified 25 May 2023 11:51

Documents

Full text available as:

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution-Noncommercial 4.0
404kB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

DiHuTra: a parallel corpus to analyse differences between human translations

Downloads