In this paper we describe our reproduction study of the human evaluation of text simplic- ity reported by Nisioi et al. (2017). The work was carried out as part of the ReproGen Shared Task 2022 on Reproducibility of Evaluations in NLG. Our aim was to repeat the evaluation of simplicity for nine automatic text simplification systems with a different set of evaluators. We describe our experimental design together with the known aspects of the original experimental design and present the results from both studies. Pearson correlation between the original and reproduction scores is moderate to high (0.776). Inter-annotator agreement in the reproduction study is lower (0.40) than in the original study (0.66). We discuss challenges arising from the unavailability of certain aspects of the origi- nal set-up, and make several suggestions as to how reproduction of similar evaluations can be made easier in future.
Proceedings of the 15th International Conference on Natural Language Generation: Generation Challenges.
.
Association for Computational Linguistics (ACL).
Science Foundation Ireland through the SFI Research Centres Programme and is co-funded under the European Regional Development Fund (ERDF) through Grant 13/RC/2106, Faculty of Engineering and Computing, Dublin City University
ID Code:
28368
Deposited On:
25 May 2023 14:21 by Maja Popovic. Last Modified 25 May 2023 14:21