Adapting WSJ-trained parsers to the British national corpus using in-domain self-training
Foster, JenniferORCID: 0000-0002-7789-4853, Wagner, JoachimORCID: 0000-0002-8290-3849, Seddah, Djamé and van Genabith, Josef
(2007)
Adapting WSJ-trained parsers to the British national corpus using in-domain self-training.
In: IWPT 2007 - 10th International Conference of Parsing Technology, 23-24 June 2007, Prague, Czech Republic.
We introduce a set of 1,000 gold standard parse trees for the British National Corpus (BNC) and perform a series of self-training experiments with Charniak and Johnson’s
reranking parser and BNC sentences. We show that retraining this parser with a combination of one million BNC parse trees
(produced by the same parser) and the original WSJ training data yields improvements of 0.4% on WSJ Section 23 and 1.7% on the new BNC gold standard set.