Browse DORAS
Browse Theses
Latest Additions
Creative Commons License
Except where otherwise noted, content on this site is licensed for use under a:

Adapting WSJ-trained parsers to the British national corpus using in-domain self-training

Foster, Jennifer and Wagner, Joachim and Seddah, Djamé and van Genabith, Josef (2007) Adapting WSJ-trained parsers to the British national corpus using in-domain self-training. In: IWPT 2007 - 10th International Conference of Parsing Technology, 23-24 June 2007, Prague, Czech Republic.

Full text available as:

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader


We introduce a set of 1,000 gold standard parse trees for the British National Corpus (BNC) and perform a series of self-training experiments with Charniak and Johnson’s reranking parser and BNC sentences. We show that retraining this parser with a combination of one million BNC parse trees (produced by the same parser) and the original WSJ training data yields improvements of 0.4% on WSJ Section 23 and 1.7% on the new BNC gold standard set.

Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Uncontrolled Keywords:parsers;
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:Research Initiatives and Centres > National Centre for Language Technology (NCLT)
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Publisher:Association for Computational Linguistics
Official URL:
Copyright Information:© 2007 Association for Computational Linguistics
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Irish Research Council for Science Engineering and Technology, IRCSET SC/02/298, IRCSET P/04/232, Science Foundation Ireland, SFI 04/IN.3/I527
ID Code:15209
Deposited On:17 Feb 2010 16:06 by DORAS Administrator. Last Modified 27 Apr 2010 14:16

Download statistics

Archive Staff Only: edit this record