Graham, Yvette and Liu, Qun ORCID: 0000-0002-7000-1792 (2016) Achieving accurate conclusions in evaluation of automatic machine translation metrics. In: 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 12-15 June 2016, San Diego, Ca. USA.
Abstract
Automatic Machine Translation metrics, such
as BLEU, are widely used in empirical evaluation as a substitute for human assessment.
Subsequently, the performance of a given metric is measured by its strength of correlation
with human judgment. When a newly proposed metric achieves a stronger correlation
over that of a baseline, it is important to take
into account the uncertainty inherent in correlation point estimates prior to concluding
improvements in metric performance. Confidence intervals for correlations with human
judgment are rarely reported in metric evaluations, however, and when they have been
reported, the most suitable methods have unfortunately not been applied. For example,
incorrect assumptions about correlation sampling distributions made in past evaluations
risk over-estimation of significant differences
in metric performance. In this paper, we provide analysis of each of the issues that may
lead to inaccuracies before providing detail of
a method that overcomes previous challenges.
Additionally, we propose a new method of
translation sampling that in contrast achieves
genuine high conclusivity in evaluation of the
relative performance of metrics.
Metadata
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Event Type: | Conference |
Refereed: | Yes |
Subjects: | Computer Science > Machine translating |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > ADAPT |
Published in: | Proceedings of NAACL-HLT 2016. . Association for Computational Linguistics. |
Publisher: | Association for Computational Linguistics |
Official URL: | http://dx.doi.org/10.18653/v1/N16-1001 |
Copyright Information: | © 2016 Association for Computational Linguistics |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License |
Funders: | European Union Horizon 2020 research and innovation programme under grant agreement 645452 (QT21), ADAPT Centre for Digital Content Technology at Dublin City University funded under the SFI Research Centres Programme (Grant 13/RC/2106) co-funded under the European Regional Development Fund. |
ID Code: | 23208 |
Deposited On: | 25 Apr 2019 13:57 by Thomas Murtagh . Last Modified 06 Mar 2020 09:47 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
393kB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record