Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Large-scale induction and evaluation of lexical resources from the Penn-II treebank

O'Donovan, Ruth, Burke, Michael, Cahill, Aoife orcid logoORCID: 0000-0002-3519-7726, van Genabith, Josef and Way, Andy orcid logoORCID: 0000-0001-5736-5930 (2004) Large-scale induction and evaluation of lexical resources from the Penn-II treebank. In: ACL 2004 - 42nd Annual Meeting of the Association for Computational Linguistics, 21-26 July 2004, Barcelona, Spain.

Abstract
In this paper we present a methodology for extracting subcategorisation frames based on an automatic LFG f-structure annotation algorithm for the Penn-II Treebank. We extract abstract syntactic function-based subcategorisation frames (LFG semantic forms), traditional CFG categorybased subcategorisation frames as well as mixed function/category-based frames, with or without preposition information for obliques and particle information for particle verbs. Our approach does not predefine frames, associates probabilities with frames conditional on the lemma, distinguishes between active and passive frames, and fully reflects the effects of long-distance dependencies in the source data structures. We extract 3586 verb lemmas, 14348 semantic form types (an average of 4 per lemma) with 577 frame types. We present a large-scale evaluation of the complete set of forms extracted against the full COMLEX resource.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:lexical functional grammar;
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:Research Institutes and Centres > National Centre for Language Technology (NCLT)
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Publisher:Association for Computational Linguistics
Official URL:http://www.aclweb.org/anthology/P/P04/
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Enterprise Ireland, EI SC/2001/186, Irish Research Council for Science Engineering and Technology
ID Code:15309
Deposited On:15 Mar 2010 14:47 by DORAS Administrator . Last Modified 25 Jan 2019 11:42
Documents

Full text available as:

[thumbnail of odonovan_et_al_04.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
62kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record