Traditionally, rich, constraint-based grammatical resources have been hand-coded. Scaling wide-coverage, deep, constraint-based grammars such as Lexical-Functional Grammars from fragments to naturally occurring unrestricted text is knowledge-intensive, timeconsuming and (often prohibitively) expensive.
Based on earlier work by McCarthy (2003), this thesis presents the development and evaluation of an automatic LFG f-structure annotation algorithm which is the core component in a larger project on rapid, wide-coverage, deep, constraint-based, multilingual grammar acquisition, addressing the knowledge acquisition bottleneck familiar from traditional rule-based approaches to NLP and AI. The algorithm annotates the Penn-II treebank with LFG f-structure information. Grammars and lexical resources are then extracted from the f-structure annotated treebank. Extensive evaluation of the annotation algorithm against independently constructed gold-standards (PARC 700 Dependency Bank and Propbank) shows the quality of the f-structures acquired.
The methodology developed in this thesis has been deployed for multilingual, rapid grammar development: grammars and lexical resources for Mandarin Chinese were acquired from the Penn Chinese Treebank (CTB) using a generic version of the annotation algorithm, seeded with linguistic generalisations for Mandarin Chinese.