Achieving accurate conclusions in evaluation of automatic machine translation metrics

Graham, Yvette and Liu, Qun ORCID: 0000-0002-7000-1792 (2016) Achieving accurate conclusions in evaluation of automatic machine translation metrics. In: 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), 12-17 June 2016, San Diego, CA. USA. ISBN 978-1-941643-91-4

[+]

Automatic Machine Translation metrics, such as BLEU, are widely used in empirical evaluation as a substitute for human assessment. Subsequently, the performance of a given metric is measured by its strength of correlation with human judgment. When a newly proposed metric achieves a stronger correlation over that of a baseline, it is important to take into account the uncertainty inherent in correlation point estimates prior to concluding improvements in metric performance. Confidence intervals for correlations with human judgment are rarely reported in metric evaluations, however, and when they have been reported, the most suitable methods have unfortunately not been applied. For example, incorrect assumptions about correlation sampling distributions made in past evaluations risk over-estimation of significant differences in metric performance. In this paper, we provide analysis of each of the issues that may lead to inaccuracies before providing detail of a method that overcomes previous challenges. Additionally, we propose a new method of translation sampling that in contrast achieves genuine high conclusivity in evaluation of the relative performance of metrics.

Item Type:	Conference or Workshop Item (Paper)
Event Type:	Conference
Refereed:	Yes
Subjects:	Computer Science > Machine translating
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > ADAPT
Published in:	Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL). . Association for Computational Linguistics. ISBN 978-1-941643-91-4
Publisher:	Association for Computational Linguistics
Official URL:	http://dx.doi.org/10.18653/v1/N16-1001
Copyright Information:	© 2016 The Association for Computational Linguistics
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:	European Union Horizon 2020 research and innovation programme under grant agreement 645452 (QT21), SFI Research Centres Programme (Grant 13/RC/2106) co-funded under the European Regional Development Fund.
ID Code:	23194
Deposited On:	17 Apr 2019 10:28 by Thomas Murtagh . Last Modified 22 Jul 2019 15:00

Full text available as:

[thumbnail of Achieving_Accurate_Conclusions_in_Evaluation_of_Automatic_Machine_Translation_Metrics[1].pdf]

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
393kB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

Achieving accurate conclusions in evaluation of automatic machine translation metrics

Altmetric Badge

Dimensions Badge

Downloads