Large-scale induction and evaluation of lexical resources from the Penn-II and Penn-III treebanks

O'Donovan, Ruth; Burke, Michael; Cahill, Aoife; van Genabith, Josef; Way, Andy

O'Donovan, Ruth, Burke, Michael, Cahill, Aoife ORCID: 0000-0002-3519-7726, van Genabith, Josef and Way, Andy ORCID: 0000-0001-5736-5930 (2005) Large-scale induction and evaluation of lexical resources from the Penn-II and Penn-III treebanks. Computational Linguistics, 31 (3). pp. 328-365. ISSN 1530-9312

Abstract
Metadata
Downloads
Documents
Metrics

[+][-]

Abstract

We present a methodology for extracting subcategorization frames based on an automatic lexical-functional grammar (LFG) f-structure annotation algorithm for the Penn-II and Penn-III Treebanks. We extract syntactic-function-based subcategorization frames (LFG semantic forms) and traditional CFG category-based subcategorization frames as well as mixed function/category-based frames, with or without preposition information for obliques and particle information for particle verbs. Our approach associates probabilities with frames conditional on the lemma, distinguishes between active and passive frames, and fully reflects the effects of long-distance dependencies in the source data structures. In contrast to many other approaches, ours does not predefine the subcategorization frame types extracted, learning them instead from the source data. Including particles and prepositions, we extract 21,005 lemma frame types for 4,362 verb lemmas, with a total of 577 frame types and an average of 4.8 frame types per verb. We present a large-scale evaluation of the complete set of forms extracted against the full COMLEX resource. To our knowledge, this is the largest and most complete evaluation of subcategorization frames acquired automatically for English.

Metadata

Item Type:	Article (Published)
Refereed:	Yes
Uncontrolled Keywords:	Treebanks; Penn-II; Penn-III; Lexical-functional grammar
Subjects:	Computer Science > Machine translating
DCU Faculties and Centres:	UNSPECIFIED
Publisher:	Massachusetts Institute of Technology Press
Official URL:	http://www.mitpressjournals.org/doi/pdf/10.1162/08...
Copyright Information:	© 2005 Massachusetts Institute of Technology Press.
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
ID Code:	16178
Deposited On:	16 May 2011 13:39 by Shane Harper . Last Modified 25 Jan 2019 12:02

Documents

Full text available as:

[thumbnail of Large-Scale_Induction_and_Evaluation_of_lexical_resources_from_the_Penn-II_and_Penn-III_Treebanks.pdf]

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
653kB

Metrics

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

Large-scale induction and evaluation of lexical resources from the Penn-II and Penn-III treebanks

Altmetric Badge

Dimensions Badge

Downloads