Large-scale induction and evaluation of lexical resources from the Penn-II treebank
O'Donovan, Ruth and Burke, Michael and Cahill, Aoife and van Genabith, Josef and Way, Andy (2004) Large-scale induction and evaluation of lexical resources from the Penn-II treebank. In: ACL 2004 - 42nd Annual Meeting of the Association for Computational Linguistics, 21-26 July 2004, Barcelona, Spain.
Full text available as:
In this paper we present a methodology for extracting
subcategorisation frames based on an automatic LFG f-structure annotation algorithm for the Penn-II Treebank. We extract abstract syntactic function-based subcategorisation frames (LFG semantic forms), traditional CFG categorybased subcategorisation frames as well as mixed
function/category-based frames, with or without preposition information for obliques and particle information for particle verbs. Our approach does not predefine frames, associates probabilities with frames conditional on the lemma, distinguishes between active and passive frames, and fully reflects the effects of long-distance dependencies in the source data structures. We extract 3586 verb lemmas,
14348 semantic form types (an average of 4 per lemma) with 577 frame types. We present a large-scale evaluation of the complete set of forms extracted against the full COMLEX resource.
Archive Staff Only: edit this record