Adapting a WSJ-trained parser to grammatically noisy text
Foster, JenniferORCID: 0000-0002-7789-4853, Wagner, JoachimORCID: 0000-0002-8290-3849 and van Genabith, Josef
(2008)
Adapting a WSJ-trained parser to grammatically noisy text.
In: ACL-08:HLT - 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 15-20 June 2008, Columbus, USA.
We present a robust parser which is trained on a treebank of ungrammatical sentences. The treebank is created automatically by modifying Penn treebank sentences so that they contain one or more syntactic errors. We evaluate
an existing Penn-treebank-trained parser on the ungrammatical treebank to see how it reacts to noise in the form of grammatical errors. We re-train this parser on the training section of the ungrammatical treebank, leading
to an significantly improved performance on the ungrammatical test sets. We show how a classifier can be used to prevent performance degradation on the original grammatical data.