Foster, Jennifer ORCID: 0000-0002-7789-4853 (2007) Treebanks gone bad: generating a treebank of ungrammatical English. In: AND 2007 Workshop on Analytics for Noisy Unstructured Data at IJCAI 2007 - 20th International Joint Conference on Artificial Intelligence, 8 January 2007, Hyderabad, India.
Abstract
This paper describes how a treebank of ungrammatical
sentences can be created from a treebank of well-formed sentences. The treebank creation procedure involves the automatic introduction of frequently occurring grammatical errors into the sentences in an existing treebank, and the minimal transformation of the analyses in the treebank so
that they describe the newly created ill-formed sentences.
Such a treebank can be used to test how well a parser is able to ignore grammatical errors in texts (as people can), and can be used to induce a grammar capable of analysing such sentences. This paper also demonstrates the first of these uses.
Metadata
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Event Type: | Workshop |
Refereed: | Yes |
Uncontrolled Keywords: | ungrammatical sentences; |
Subjects: | Computer Science > Machine translating |
DCU Faculties and Centres: | Research Institutes and Centres > National Centre for Language Technology (NCLT) DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing |
Official URL: | http://research.ihost.com/and2007/ |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License |
Funders: | Irish Research Council for Science Engineering and Technology |
ID Code: | 15208 |
Deposited On: | 17 Feb 2010 15:51 by DORAS Administrator . Last Modified 10 Oct 2018 15:08 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
110kB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record