C-structures and f-structures for the British national corpus

Wagner, Joachim; Seddah, Djamé; Foster, Jennifer; van Genabith, Josef

Wagner, Joachim ORCID: 0000-0002-8290-3849, Seddah, Djamé, Foster, Jennifer ORCID: 0000-0002-7789-4853 and van Genabith, Josef (2007) C-structures and f-structures for the British national corpus. In: Lexical Functional Grammar 2007, 28-30 July 2007, California, USA.

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

We describe how the British National Corpus (BNC), a one hundred million word balanced corpus of British English, was parsed into Lexical Functional Grammar (LFG) c-structures and f-structures, using a treebank-based parsing architecture. The parsing architecture uses a state-of-the-art statistical parser and reranker trained on the Penn Treebank to produce context-free phrase structure trees, and an annotation algorithm to automatically annotate these trees into LFG f-structures. We describe the pre-processing steps which were taken to accommodate the differences between the Penn Treebank and the BNC. Some of the issues encountered in applying the parsing architecture on such a large scale are discussed. The process of annotating a gold standard set of 1,000 parse trees is described. We present evaluation results obtained by evaluating the c-structures produced by the statistical parser against the c-structure gold standard. We also present the results obtained by evaluating the f-structures produced by the annotation algorithm against an automatically constructed f-structure gold standard. The c-structures achieve an f-score of 83.7% and the f-structures an f-score of 91.2%.

Metadata

Item Type:	Conference or Workshop Item (Paper)
Event Type:	Conference
Refereed:	Yes
Uncontrolled Keywords:	lexical functional grammar;
Subjects:	Computer Science > Machine translating
DCU Faculties and Centres:	Research Institutes and Centres > National Centre for Language Technology (NCLT) DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Published in:	Proceedings of the LFG07 Conference. . CSLI Publications.
Publisher:	CSLI Publications
Official URL:	http://csli-publications.stanford.edu/LFG/12/lfg07...
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:	Irish Research Council for Science Engineering and Technology, IRCSET SC/02/298, IRCSET P/04/232, Science Foundation Ireland, SFI 04/IN/I527
ID Code:	15205
Deposited On:	17 Feb 2010 14:46 by DORAS Administrator . Last Modified 10 Oct 2018 15:16

Documents

Full text available as:

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
115kB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

C-structures and f-structures for the British national corpus

Downloads