Browse DORAS
Browse Theses
Latest Additions
Creative Commons License
Except where otherwise noted, content on this site is licensed for use under a:

Treebanks gone bad: generating a treebank of ungrammatical English

Foster, Jennifer (2007) Treebanks gone bad: generating a treebank of ungrammatical English. In: AND 2007 Workshop on Analytics for Noisy Unstructured Data at IJCAI 2007 - 20th International Joint Conference on Artificial Intelligence, 8 January 2007, Hyderabad, India.

Full text available as:

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader


This paper describes how a treebank of ungrammatical sentences can be created from a treebank of well-formed sentences. The treebank creation procedure involves the automatic introduction of frequently occurring grammatical errors into the sentences in an existing treebank, and the minimal transformation of the analyses in the treebank so that they describe the newly created ill-formed sentences. Such a treebank can be used to test how well a parser is able to ignore grammatical errors in texts (as people can), and can be used to induce a grammar capable of analysing such sentences. This paper also demonstrates the first of these uses.

Item Type:Conference or Workshop Item (Paper)
Event Type:Workshop
Uncontrolled Keywords:ungrammatical sentences;
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:Research Initiatives and Centres > National Centre for Language Technology (NCLT)
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Official URL:
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Irish Research Council for Science Engineering and Technology
ID Code:15208
Deposited On:17 Feb 2010 15:51 by DORAS Administrator. Last Modified 27 Apr 2010 14:13

Download statistics

Archive Staff Only: edit this record