Agree to disagree: analysis of Inter-annotator disagreements in human evaluation of machine translation output
Popović, MajaORCID: 0000-0001-8234-8745
(2021)
Agree to disagree: analysis of Inter-annotator disagreements in human evaluation of machine translation output.
In: 25th Conference on Computational Natural Language Learning, 10-11 Nov 2021, Punta Cana, Dominican Republic & Online.
This work describes an analysis of inter-annotator disagreements in human evaluation of machine translation output. The errors in the analysed texts were marked by multiple annotators under guidance of different quality criteria: adequacy, comprehension, and an unspecified generic mixture of adequacy and fluency. Our results show that different criteria result in different disagreements, and indicate that a clear definition of quality criterion can improve the inter-annotator agreement. Furthermore, our results show that for certain linguistic phenomena which are not limited to one or two words (such as word ambiguity or gender) but span over several words or even entire phrases (such as negation or relative clause), disagreements do not necessarily represent ``errors'' or ``noise'' but are rather inherent to the evaluation process. %These disagreements are caused by differences in error perception and/or the fact that there is no single correct translation of a text so that multiple solutions are possible. On the other hand, for some other phenomena (such as omission or verb forms) agreement can be easily improved by providing more precise and detailed instructions to the evaluators.
Science Foundation Ireland through the SFI Research Centres Programme Grant 13/RC/2106, European Regional Development Fund (ERDF), European Association for Machine Translation (EAMT) under its programme “2019 Sponsorship of Activities”.
ID Code:
28357
Deposited On:
23 May 2023 12:37 by Maja Popovic. Last Modified 23 May 2023 12:37