Wagner, Joachim ORCID: 0000-0002-8290-3849, Foster, Jennifer ORCID: 0000-0002-7789-4853 and van Genabith, Josef (2007) A comparative evaluation of deep and shallow approaches to the automatic detection of common grammatical errors. In: EMNLP-CoNLL 2007 - Joint Meeting of the Conference on Empirical Methods in Natural Language Processing and the Conference on Computational Natural Language Learning, 28-30 June 2007, Prague, Czech Republic.
Abstract
This paper compares a deep and a shallow processing approach to the problem of classifying a sentence as grammatically wellformed or ill-formed. The deep processing
approach uses the XLE LFG parser and English grammar: two versions are presented, one which uses the XLE directly to perform the classification, and another one which uses a decision tree trained on features consisting of the XLE’s output statistics. The shallow processing approach predicts grammaticality based on n-gram frequency statistics:
we present two versions, one which uses frequency thresholds and one which uses a decision tree trained on the frequencies of the rarest n-grams in the input sentence.
We find that the use of a decision tree improves on the basic approach only for the deep parser-based approach. We also show that combining both the shallow and deep
decision tree features is effective. Our evaluation
is carried out using a large test set of grammatical and ungrammatical sentences. The ungrammatical test set is generated automatically by inserting grammatical errors
into well-formed BNC sentences.
Metadata
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Event Type: | Conference |
Refereed: | Yes |
Uncontrolled Keywords: | error detection; |
Subjects: | Computer Science > Machine translating |
DCU Faculties and Centres: | Research Institutes and Centres > National Centre for Language Technology (NCLT) DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing |
Publisher: | Association for Computational Linguistics |
Official URL: | http://www.aclweb.org/anthology/D/D07/ |
Copyright Information: | © 2007 Association for Computational Linguistics |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License |
Funders: | Irish Research Council for Science Engineering and Technology, IRCSET SC/02/298, IRCSET P/04/232 |
ID Code: | 15214 |
Deposited On: | 17 Feb 2010 16:55 by DORAS Administrator . Last Modified 10 Oct 2018 15:17 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
111kB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record