Rehbein, Ines and van Genabith, Josef
(2007)
Evaluating evaluation measures.
In: NODALIDA 2007 - 16th Nordic Conference on Computational Linguistic, 25-26 May 2007, Tartu, Estonia.
This paper presents a thorough examination of the validity of three evaluation measures on parser output. We assess parser performance of an unlexicalised probabilistic parser trained on two German treebanks with different annotation schemes and evaluate parsing results using the PARSEVAL
metric, the Leaf-Ancestor metric and a dependency-based evaluation. We reject the claim that the T¨uBa-D/Z annotation scheme is more adequate then the TIGER scheme
for PCFG parsing and show that PARSEVAL should not be used to compare parser performance for parsers trained on treebanks with different annotation schemes. An analysis
of specific error types indicates that the dependency-based evaluation is most appropriate to reflect parse quality.