Browse DORAS
Browse Theses
Search
Latest Additions
Creative Commons License
Except where otherwise noted, content on this site is licensed for use under a:

Parsing with PCFGs and automatic f-structure annotation

Cahill, Aoife and McCarthy, Mairead and van Genabith, Josef and Way, Andy (2002) Parsing with PCFGs and automatic f-structure annotation. In: LFG02 - 7th International Lexical Functional Grammar Conference, 3-5 July, 2002, Athens, Greece. ISBN 1098-6782

Full text available as:

[img]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
327Kb

Abstract

The development of large coverage, rich unification- (constraint-) based grammar resources is very time consuming, expensive and requires lots of linguistic expertise. In this paper we report initial results on a new methodology that attempts to partially automate the development of substantial parts of large coverage, rich unification- (constraint-) based grammar resources. The method is based on a treebank resource (in our case Penn-II) and an automatic f-structure annotation algorithm that annotates treebank trees with proto-f-structure information. Based on these, we present two parsing architectures: in our pipeline architecture we first extract a PCFG from the treebank following the method of (Charniak,1996), use the PCFG to parse new text, automatically annotate the resulting trees with our f-structure annotation algorithm and generate proto-f-structures. By contrast, in the integrated architecture we first automatically annotate the treebank trees with f-structure information and then extract an annotated PCFG (A-PCFG) from the treebank. We then use the A-PCFG to parse new text to generate proto-f-structures. Currently our best parsers achieve more than 81% f-score on the 2400 trees in section 23 of the Penn-II treebank and more than 60% f-score on gold-standard proto-f-structures for 105 randomly selected trees from section 23.

Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:probabilistic context-free grammar;
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:Research Initiatives and Centres > National Centre for Language Technology (NCLT)
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Published in:Proceedings of the LFG 02 Conference. . CSLI Publications. ISBN 1098-6782
Publisher:CSLI Publications
Official URL:http://csli-publications.stanford.edu/LFG/7/lfg02-toc.html
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Enterprise Ireland, EI SC/2001/186
ID Code:15346
Deposited On:09 Apr 2010 11:32 by DORAS Administrator. Last Modified 28 Apr 2010 12:08

Download statistics

Archive Staff Only: edit this record