Home
Browse By

Author

DCU Faculties and Centres

Theses

Subject

Year

Publication Type

Year of Award

Supervisors
About / FAQ
Statistics
Login (DCU Staff Only)

Missing information, unresponsive authors, experimental flaws: the impossibility of assessing the reproducibility of previous human evaluations in NLP

Belz, Anya ORCID: 0000-0002-0552-8096, Thomson, Craig, Reiter, Ehud ORCID: 0000-0002-7548-9504, Abercrombie, Gavin ORCID: 0000-0002-6546-3562, Alonso-Moral, Jose M. ORCID: 0000-0003-3673-421X, Arvan, Mohammad, Cheung, Jackie, Cieliebak, Mark ORCID: 0009-0007-3059-8516, Clark, Elizabeth, van Deemter, Kees, Kelleher, John D. ORCID: 0000-0001-6462-3248 and Klubička, Filip ORCID: 0000-0001-9712-6141 (2023) Missing information, unresponsive authors, experimental flaws: the impossibility of assessing the reproducibility of previous human evaluations in NLP. In: Fourth Workshop on Insights from Negative Results in NLP, 2 May 2023, Dubrovnik, Croatia. ISBN 978-1-959429-49-4

[+]

We report our efforts in identifying a set of previous humane valuations in NLP that would be suitable for a coordinated study examining what makes human evaluations in NLP more/less reproducible. We present our results and findings, which include that just13% of papers had (i) sufficiently low barriers to reproduction, and (ii) enough obtainable information, to be considered for reproduction, and that all but one of the experiments we selected for reproduction was discovered to have flaws that made the meaningfulness of conducting are production questionable. As a result, we had to change our coordinated study design from a reproduce approach to a standardise-then-reproduce-twice approach. Our overall (negative)finding that the great majority of human evaluations in NLP is not repeatable and/or not reproducible and/or too flawed to justify reproduction, paints a dire picture, but presents an opportunity for a rethink about how to design and report human evaluations in NLP

Item Type:	Conference or Workshop Item (Paper)
Event Type:	Workshop
Refereed:	Yes
Subjects:	Computer Science > Computational linguistics
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > ADAPT
Publisher:	Association for Computational Linguistics (ACL)
Official URL:	https://aclanthology.org/2023.insights-1.1
Copyright Information:	© 2023 ACL
ID Code:	28664
Deposited On:	04 Jul 2023 12:25 by Anya Belz . Last Modified 06 Jul 2023 10:06

Full text available as:

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution 4.0
188kB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

Missing information, unresponsive authors, experimental flaws: the impossibility of assessing the reproducibility of previous human evaluations in NLP

Altmetric Badge

Dimensions Badge

Downloads