Is all that glitters in MT quality estimation really gold standard?

Graham, Yvette; Baldwin, Timothy; Dowling, Meghan; Eskevich, Maria; Lynn, Teresa; Tounsi, Lamia

Graham, Yvette, Baldwin, Timothy, Dowling, Meghan ORCID: 0000-0003-1637-4923, Eskevich, Maria, Lynn, Teresa and Tounsi, Lamia (2016) Is all that glitters in MT quality estimation really gold standard? In: 26th International Conference on Computational Linguistics, 11-17 Dec 2016, Osaka, Japan. ISBN 978-4-87974-702-0

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

Human-targeted metrics provide a compromise between human evaluation of machine translation, where high inter-annotator agreement is difficult to achieve, and fully automatic metrics, such as BLEU or TER, that lack the validity of human assessment. Human-targeted translation edit rate (HTER) is by far the most widely employed human-targeted metric in machine translation, commonly employed, for example, as a gold standard in evaluation of quality estimation. Original experiments justifying the design of HTER, as opposed to other possible formulations, were limited to a small sample of translations and a single language pair, however, and this motivates our re-evaluation of a range of human-targeted metrics on a substantially larger scale. Results show significantly stronger correlation with human judgment for HBLEU over HTER for two of the nine language pairs we include and no significant difference between correlations achieved by HTER and HBLEU for the remaining language pairs. Finally, we evaluate a range of quality estimation systems employing HTER and direct assessment (DA) of translation adequacy as gold labels, resulting in a divergence in system rankings, and propose employment of DA for future quality estimation evaluations.

Metadata

Item Type:	Conference or Workshop Item (Paper)
Event Type:	Conference
Refereed:	Yes
Subjects:	Computer Science > Machine translating
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > ADAPT
Published in:	Matsumoto, Yuji and Prasad, Rashmi, (eds.) Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. . Association for Computational Linguistics (ACL). ISBN 978-4-87974-702-0
Publisher:	Association for Computational Linguistics (ACL)
Official URL:	https://www.aclweb.org/anthology/C16-1
Copyright Information:	© 2016 ACL. Creative Commons 4.0
Funders:	European Union Horizon 2020 research and innovation programme under grant agreement 645452 (QT21), ADAPT Centre for Digital Content Technology (www. adaptcentre.ie ) at Dublin City University funded under the SFI Research Centres Programme (Grant 13/RC/2106) co-funded under the European Regional Development Fund.
ID Code:	23608
Deposited On:	31 Jul 2019 11:31 by INVALID USER. Last Modified 12 Aug 2020 17:26

Documents

Full text available as:

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
230kB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

Is all that glitters in MT quality estimation really gold standard?

Downloads