Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Missing information, unresponsive authors, experimental flaws: the impossibility of assessing the reproducibility of previous human evaluations in NLP

Belz, Anya orcid logoORCID: 0000-0002-0552-8096, Thomson, Craig, Reiter, Ehud orcid logoORCID: 0000-0002-7548-9504, Abercrombie, Gavin orcid logoORCID: 0000-0002-6546-3562, Alonso-Moral, Jose M. orcid logoORCID: 0000-0003-3673-421X, Arvan, Mohammad, Cheung, Jackie, Cieliebak, Mark orcid logoORCID: 0009-0007-3059-8516, Clark, Elizabeth, van Deemter, Kees, Kelleher, John D. orcid logoORCID: 0000-0001-6462-3248 and Klubička, Filip orcid logoORCID: 0000-0001-9712-6141 (2023) Missing information, unresponsive authors, experimental flaws: the impossibility of assessing the reproducibility of previous human evaluations in NLP. In: Fourth Workshop on Insights from Negative Results in NLP, 2 May 2023, Dubrovnik, Croatia. ISBN 978-1-959429-49-4

Abstract
We report our efforts in identifying a set of previous humane valuations in NLP that would be suitable for a coordinated study examining what makes human evaluations in NLP more/less reproducible. We present our results and findings, which include that just13% of papers had (i) sufficiently low barriers to reproduction, and (ii) enough obtainable information, to be considered for reproduction, and that all but one of the experiments we selected for reproduction was discovered to have flaws that made the meaningfulness of conducting are production questionable. As a result, we had to change our coordinated study design from a reproduce approach to a standardise-then-reproduce-twice approach. Our overall (negative)finding that the great majority of human evaluations in NLP is not repeatable and/or not reproducible and/or too flawed to justify reproduction, paints a dire picture, but presents an opportunity for a rethink about how to design and report human evaluations in NLP
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Workshop
Refereed:Yes
Subjects:Computer Science > Computational linguistics
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Institutes and Centres > ADAPT
Publisher:Association for Computational Linguistics (ACL)
Official URL:https://aclanthology.org/2023.insights-1.1
Copyright Information:© 2023 ACL
ID Code:28664
Deposited On:04 Jul 2023 12:25 by Anya Belz . Last Modified 06 Jul 2023 10:06
Documents

Full text available as:

[thumbnail of 2023.insights-1.1.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution 4.0
188kB
Metrics

Altmetric Badge

Dimensions Badge

Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record