TePaCoC - a testsuite for testing parser performance on
complex German grammatical constructions
Kübler, Sandra and Rehbein, Ines
(2009)
TePaCoC - a testsuite for testing parser performance on
complex German grammatical constructions.
In: TLT 7 - 7th International Workshop on Treebanks and Linguistic Theories, 23-24 January 2009, Groningen, The Netherlands.
Traditionally, parsers are evaluated against gold standard test data. This can cause
problems if there is a mismatch between the data structures and representations
used by the parser and the gold standard. A particular case in point is German,
for which two treebanks (TiGer and TüBa-D/Z) are available with highly different
annotation schemes for the acquisition of (e.g.) PCFG parsers. The differences between
the TiGer and TüBa-D/Z annotation schemes make fair and unbiased parser
evaluation difficult [7, 9, 12]. The resource (TEPACOC) presented in this paper
takes a different approach to parser evaluation: instead of providing evaluation
data in a single annotation scheme, TEPACOC uses comparable sentences and
their annotations for 5 selected key grammatical phenomena (with 20 sentences
each per phenomena) from both TiGer and TüBa-D/Z resources. This provides a 2
times 100 sentence comparable testsuite which allows us to evaluate TiGer-trained
parsers against the TiGer part of TEPACOC, and TüBa-D/Z-trained parsers against
the TüBa-D/Z part of TEPACOC for key phenomena, instead of comparing them
against a single (and potentially biased) gold standard. To overcome the problem of
inconsistency in human evaluation and to bridge the gap between the two different
annotation schemes, we provide an extensive error classification, which enables us
to compare parser output across the two different treebanks.
In the remaining part of the paper we present the testsuite and describe the
grammatical phenomena covered in the data. We discuss the different annotation
strategies used in the two treebanks to encode these phenomena and present our
error classification of potential parser errors.