Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Parsing with PCFGs and automatic f-structure annotation

Cahill, Aoife orcid logoORCID: 0000-0002-3519-7726, McCarthy, Mairéad, van Genabith, Josef orcid logoORCID: 0000-0003-1322-7944 and Way, Andy orcid logoORCID: 0000-0001-5736-5930 (2002) Parsing with PCFGs and automatic f-structure annotation. In: LFG02 - 7th International Lexical Functional Grammar Conference, 3-5 July, 2002, Athens, Greece. ISBN 1098-6782

Abstract
The development of large coverage, rich unification- (constraint-) based grammar resources is very time consuming, expensive and requires lots of linguistic expertise. In this paper we report initial results on a new methodology that attempts to partially automate the development of substantial parts of large coverage, rich unification- (constraint-) based grammar resources. The method is based on a treebank resource (in our case Penn-II) and an automatic f-structure annotation algorithm that annotates treebank trees with proto-f-structure information. Based on these, we present two parsing architectures: in our pipeline architecture we first extract a PCFG from the treebank following the method of (Charniak,1996), use the PCFG to parse new text, automatically annotate the resulting trees with our f-structure annotation algorithm and generate proto-f-structures. By contrast, in the integrated architecture we first automatically annotate the treebank trees with f-structure information and then extract an annotated PCFG (A-PCFG) from the treebank. We then use the A-PCFG to parse new text to generate proto-f-structures. Currently our best parsers achieve more than 81% f-score on the 2400 trees in section 23 of the Penn-II treebank and more than 60% f-score on gold-standard proto-f-structures for 105 randomly selected trees from section 23.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:probabilistic context-free grammar;
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:Research Institutes and Centres > National Centre for Language Technology (NCLT)
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Published in: Proceedings of the LFG 02 Conference. . CSLI Publications. ISBN 1098-6782
Publisher:CSLI Publications
Official URL:http://csli-publications.stanford.edu/LFG/7/lfg02-...
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Enterprise Ireland, EI SC/2001/186
ID Code:15346
Deposited On:09 Apr 2010 10:32 by DORAS Administrator . Last Modified 21 Jan 2022 16:36
Documents

Full text available as:

[thumbnail of lfg02cahilletal.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
335kB
Metrics

Altmetric Badge

Dimensions Badge

Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record