Moslem, Yasmin ORCID: 0000-0003-4595-6877, Haque, Rejwanul ORCID: 0000-0003-1680-0099 and Way, Andy ORCID: 0000-0001-5736-5930 (2020) Arabisc: context-sensitive neural spelling checker. In: AACL Workshop of Natural Language Processing Techniques for Educational Application (NLP-TEA), Asia-Pacific Chapter of the Association for Computational Linguistics (AACL), 4 Dec 2020, Suzhou, China (Online).
Abstract
Traditional statistical approaches to spelling correction usually consist of two consecutive processes – error detection and correction – and they are generally computationally intensive. Current state-of-the-art neural spelling correction models usually attempt to correct spelling errors directly over an entire sentence, which, as a consequence, lacks control of the process, e.g. they are prone to overcorrection. In recent years, recurrent neural networks (RNNs), in particular long short-term memory (LSTM) hidden units, have proven increasingly popular and powerful models for many natural language processing (NLP) problems. Accordingly, we made use of a bidirectional LSTM language model (LM) for our context-sensitive spelling detection and correction model which is shown to have much control over the correction process. While the use of LMs for spelling checking and correction is not new to this line of NLP research, our proposed approach makes better use of the rich neighbouring context, not only from before the word to be corrected, but also after it, via a dual-input deep LSTM network. Although in theory our proposed approach can be applied to any language, we carried out our experiments on Arabic, which we believe adds additional value given the fact that there are limited linguistic resources readily available in Arabic in comparison to many languages. Our experimental results demonstrate that the pro- posed methods are effective in both improving the quality of correction suggestions and minimising overcorrection.
Metadata
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Event Type: | Workshop |
Refereed: | Yes |
Additional Information: | Code: https://github.com/ymoslem/Arabisc/blob/main/README.md |
Uncontrolled Keywords: | Spelling Checking; Spelling Correction |
Subjects: | Computer Science > Computational linguistics Computer Science > Computer engineering Computer Science > Machine learning |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing |
Official URL: | https://www.aclweb.org/anthology/2020.nlptea-1.2/ |
Copyright Information: | © 2020 The Authors. CC-BY-4.0 |
Funders: | Science Foundation Ireland (SFI) Research Centres Programme (Grant No. 13/RC/2106), European Regional Development Fund, Research grants from SFI and Microsoft under Grant Numbers 13/RC/2077 and 18/CRT/6224 |
ID Code: | 25403 |
Deposited On: | 28 Jan 2021 14:11 by Thomas Murtagh . Last Modified 28 Jan 2021 14:11 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
234kB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record