Burke, Michael, Cahill, Aoife ORCID: 0000-0002-3519-7726, O'Donovan, Ruth, van Genabith, Josef and Way, Andy ORCID: 0000-0001-5736-5930 (2004) Evaluation of an automatic f-structure annotation algorithm against the PARC 700 dependency bank. In: LFG'04 - Proceedings of the 9th International Conference on Lexical-Functional Grammar, Christchurch, New Zealand, 10-12 July 2004. ISBN 1098-6782
Abstract
An automatic method for annotating the Penn-II Treebank (Marcus et al., 1994) with high-level Lexical Functional Grammar (Kaplan and Bresnan, 1982; Bresnan, 2001; Dalrymple, 2001) f-structure representations is described in (Cahill et al., 2002; Cahill et al., 2004a; Cahill et al., 2004b; O’Donovan et al., 2004). The annotation algorithm and the automatically-generated f-structures are the basis for the automatic acquisition of wide-coverage and robust probabilistic approximations of LFG grammars (Cahill et al., 2002; Cahill et al., 2004a) and for the induction of LFG semantic forms (O’Donovan et al., 2004). The quality of the annotation algorithm and the f-structures it generates is, therefore, extremely important. To date, annotation quality has been measured in terms of precision and recall against the DCU 105. The annotation algorithm currently achieves an f-score of 96.57% for complete f-structures and 94.3% for preds-only
f-structures. There are a number of problems with evaluating against a gold standard of this size, most
notably that of overfitting. There is a risk of assuming that the gold standard is a complete and balanced
representation of the linguistic phenomena in a language and basing design decisions on this. It is, therefore,
preferable to evaluate against a more extensive, external standard. Although the DCU 105 is publicly available,
1 a larger well-established external standard can provide a more widely-recognised benchmark against which the quality of the f-structure annotation algorithm can be evaluated. For these reasons, we present an evaluation of the f-structure annotation algorithm of (Cahill et al., 2002; Cahill et al., 2004a; Cahill et al., 2004b; O’Donovan et al., 2004) against the PARC 700 Dependency Bank (King et al., 2003). Evaluation against an external gold standard is a non-trivial task as linguistic analyses may differ systematically between the gold standard and the output to be evaluated as regards feature geometry and nomenclature. We present conversion software to automatically account for many (but not all) of the systematic differences. Currently, we achieve an f-score of 87.31% for the f-structures generated from the original Penn-II trees and
an f-score of 81.79% for f-structures from parse trees produced by Charniak’s (2000) parser in our pipeline
parsing architecture against the PARC 700.
Metadata
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Event Type: | Conference |
Refereed: | Yes |
Uncontrolled Keywords: | lexical functional grammar; |
Subjects: | Computer Science > Machine translating |
DCU Faculties and Centres: | Research Institutes and Centres > National Centre for Language Technology (NCLT) DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing |
Publisher: | CSLI Publications |
Official URL: | http://csli-publications.stanford.edu/LFG/9/lfg04-... |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License |
ID Code: | 15301 |
Deposited On: | 15 Mar 2010 10:39 by DORAS Administrator . Last Modified 25 Jan 2019 11:44 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
107kB |
Metrics
Altmetric Badge
Dimensions Badge
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record