Evaluation of an automatic f-structure annotation algorithm against the PARC 700 dependency bank

Burke, Michael, Cahill, Aoife ORCID: 0000-0002-3519-7726, O'Donovan, Ruth, van Genabith, Josef and Way, Andy ORCID: 0000-0001-5736-5930 (2004) Evaluation of an automatic f-structure annotation algorithm against the PARC 700 dependency bank. In: LFG'04 - Proceedings of the 9th International Conference on Lexical-Functional Grammar, Christchurch, New Zealand, 10-12 July 2004. ISBN 1098-6782

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

An automatic method for annotating the Penn-II Treebank (Marcus et al., 1994) with high-level Lexical Functional Grammar (Kaplan and Bresnan, 1982; Bresnan, 2001; Dalrymple, 2001) f-structure representations is described in (Cahill et al., 2002; Cahill et al., 2004a; Cahill et al., 2004b; O’Donovan et al., 2004). The annotation algorithm and the automatically-generated f-structures are the basis for the automatic acquisition of wide-coverage and robust probabilistic approximations of LFG grammars (Cahill et al., 2002; Cahill et al., 2004a) and for the induction of LFG semantic forms (O’Donovan et al., 2004). The quality of the annotation algorithm and the f-structures it generates is, therefore, extremely important. To date, annotation quality has been measured in terms of precision and recall against the DCU 105. The annotation algorithm currently achieves an f-score of 96.57% for complete f-structures and 94.3% for preds-only f-structures. There are a number of problems with evaluating against a gold standard of this size, most notably that of overfitting. There is a risk of assuming that the gold standard is a complete and balanced representation of the linguistic phenomena in a language and basing design decisions on this. It is, therefore, preferable to evaluate against a more extensive, external standard. Although the DCU 105 is publicly available, 1 a larger well-established external standard can provide a more widely-recognised benchmark against which the quality of the f-structure annotation algorithm can be evaluated. For these reasons, we present an evaluation of the f-structure annotation algorithm of (Cahill et al., 2002; Cahill et al., 2004a; Cahill et al., 2004b; O’Donovan et al., 2004) against the PARC 700 Dependency Bank (King et al., 2003). Evaluation against an external gold standard is a non-trivial task as linguistic analyses may differ systematically between the gold standard and the output to be evaluated as regards feature geometry and nomenclature. We present conversion software to automatically account for many (but not all) of the systematic differences. Currently, we achieve an f-score of 87.31% for the f-structures generated from the original Penn-II trees and an f-score of 81.79% for f-structures from parse trees produced by Charniak’s (2000) parser in our pipeline parsing architecture against the PARC 700.

Metadata

Item Type:	Conference or Workshop Item (Paper)
Event Type:	Conference
Refereed:	Yes
Uncontrolled Keywords:	lexical functional grammar;
Subjects:	Computer Science > Machine translating
DCU Faculties and Centres:	Research Initiatives and Centres > National Centre for Language Technology (NCLT) DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Publisher:	CSLI Publications
Official URL:	http://csli-publications.stanford.edu/LFG/9/lfg04-...
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
ID Code:	15301
Deposited On:	15 Mar 2010 10:39 by DORAS Administrator . Last Modified 25 Jan 2019 11:44

Documents

Full text available as:

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
107kB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record