Treebanks gone bad: generating a treebank of ungrammatical English
Foster, JenniferORCID: 0000-0002-7789-4853
(2007)
Treebanks gone bad: generating a treebank of ungrammatical English.
In: AND 2007 Workshop on Analytics for Noisy Unstructured Data at IJCAI 2007 - 20th International Joint Conference on Artificial Intelligence, 8 January 2007, Hyderabad, India.
This paper describes how a treebank of ungrammatical
sentences can be created from a treebank of well-formed sentences. The treebank creation procedure involves the automatic introduction of frequently occurring grammatical errors into the sentences in an existing treebank, and the minimal transformation of the analyses in the treebank so
that they describe the newly created ill-formed sentences.
Such a treebank can be used to test how well a parser is able to ignore grammatical errors in texts (as people can), and can be used to induce a grammar capable of analysing such sentences. This paper also demonstrates the first of these uses.