Automatic annotation of the Penn-treebank with LFG f-structure
information
Cahill, AoifeORCID: 0000-0002-3519-7726, McCarthy, Mairéad, van Genabith, JosefORCID: 0000-0003-1322-7944 and Way, AndyORCID: 0000-0001-5736-5930
(2002)
Automatic annotation of the Penn-treebank with LFG f-structure
information.
In: LREC 2002 Workshop on Linguistic Knowledge Acquisition and Representation: Bootstrapping Annotated Language Data, 1 June 2002, Las Palmas, Canary Islands.
Lexical-Functional Grammar f-structures are abstract syntactic representations approximating basic predicate-argument structure. Treebanks annotated with f-structure information are required as training resources for stochastic versions of unification and constraint-based
grammars and for the automatic extraction of such resources. In a number of papers (Frank, 2000; Sadler, van Genabith and Way, 2000) have developed methods for automatically annotating treebank resources with f-structure information. However, to date, these methods
have only been applied to treebank fragments of the order of a few hundred trees. In the present paper we present a new method that scales and has been applied to a complete treebank, in our case the WSJ section of Penn-II (Marcus et al, 1994), with more than 1,000,000 words in about 50,000 sentences.