Browse DORAS
Browse Theses
Search
Latest Additions
Creative Commons License
Except where otherwise noted, content on this site is licensed for use under a:

Adapting a WSJ-trained parser to grammatically noisy text

Foster, Jennifer and Wagner, Joachim and van Genabith, Josef (2008) Adapting a WSJ-trained parser to grammatically noisy text. In: ACL-08:HLT - 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 15-20 June 2008, Columbus, USA.

Full text available as:

[img]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
37Kb

Abstract

We present a robust parser which is trained on a treebank of ungrammatical sentences. The treebank is created automatically by modifying Penn treebank sentences so that they contain one or more syntactic errors. We evaluate an existing Penn-treebank-trained parser on the ungrammatical treebank to see how it reacts to noise in the form of grammatical errors. We re-train this parser on the training section of the ungrammatical treebank, leading to an significantly improved performance on the ungrammatical test sets. We show how a classifier can be used to prevent performance degradation on the original grammatical data.

Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:parser;
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:Research Initiatives and Centres > National Centre for Language Technology (NCLT)
Publisher:Association for Computational Linguistics
Official URL:http://www.aclweb.org/anthology/P/P08/
Copyright Information:© 2008 Association for Computational Linguistics
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Irish Research Council for Science Engineering and Technology, IRCSET P/04/232
ID Code:15192
Deposited On:16 Feb 2010 14:25 by DORAS Administrator. Last Modified 27 Apr 2010 12:18

Download statistics

Archive Staff Only: edit this record