Disentangling the properties of human evaluation methods: a classification system to support comparability, meta-evaluation and reproducibility testing

Belz, Anya; Mille, Simon; Howcroft, David M.

Home
Browse By

Author

DCU Faculties and Centres

Theses

Subject

Year

Publication Type

Year of Award

Supervisors
About / FAQ
Statistics
Login (DCU Staff Only)

Disentangling the properties of human evaluation methods: a classification system to support comparability, meta-evaluation and reproducibility testing

Belz, Anya ORCID: 0000-0002-0552-8096, Mille, Simon ORCID: 0000-0002-8852-2764 and Howcroft, David M. ORCID: 0000-0002-0810-9065 (2020) Disentangling the properties of human evaluation methods: a classification system to support comparability, meta-evaluation and reproducibility testing. In: 13th International Natural Language Generation Conference 2020 (INLG'20), 15-18 Dec 2020, Dublin, Ireland.

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

Current standards for designing and reporting human evaluations in NLP mean it is generally unclear which evaluations are comparable and can be expected to yield similar results when applied to the same system outputs. This has serious implications for reproducibility testing and meta-evaluation, in particular given that human evaluation is considered the gold standard against which the trustworthiness of automatic metrics is gauged. %and merging others, as well as deciding which evaluations should be able to reproduce each other’s results. Using examples from NLG, we propose a classification system for evaluations based on disentangling (i) what is being evaluated (which aspect of quality), and (ii) how it is evaluated in specific (a) evaluation modes and (b) experimental designs. We show that this approach provides a basis for determining comparability, hence for comparison of evaluations across papers, meta-evaluation experiments, reproducibility testing.

Metadata

Item Type:	Conference or Workshop Item (Paper)
Event Type:	Conference
Refereed:	Yes
Subjects:	Computer Science > Computational linguistics
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > ADAPT
Published in:	Davis, Brian, Graham, Yvette and Kelleher, John D., (eds.) Proceedings of the 13th International Conference on Natural Language Generation. . Association for Computational Linguistics (ACL).
Publisher:	Association for Computational Linguistics (ACL)
Official URL:	https://aclanthology.org/2020.inlg-1.24
Copyright Information:	© 2020 Association for Computational Linguistics
ID Code:	28630
Deposited On:	06 Jul 2023 16:27 by Anya Belz . Last Modified 06 Jul 2023 16:27

Documents

Full text available as:

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
187kB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

Disentangling the properties of human evaluation methods: a classification system to support comparability, meta-evaluation and reproducibility testing

Downloads