Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Arabisc: context-sensitive neural spelling checker

Moslem, Yasmin orcid logoORCID: 0000-0003-4595-6877, Haque, Rejwanul orcid logoORCID: 0000-0003-1680-0099 and Way, Andy orcid logoORCID: 0000-0001-5736-5930 (2020) Arabisc: context-sensitive neural spelling checker. In: AACL Workshop of Natural Language Processing Techniques for Educational Application (NLP-TEA), Asia-Pacific Chapter of the Association for Computational Linguistics (AACL), 4 Dec 2020, Suzhou, China (Online).

Abstract
Traditional statistical approaches to spelling correction usually consist of two consecutive processes – error detection and correction – and they are generally computationally intensive. Current state-of-the-art neural spelling correction models usually attempt to correct spelling errors directly over an entire sentence, which, as a consequence, lacks control of the process, e.g. they are prone to overcorrection. In recent years, recurrent neural networks (RNNs), in particular long short-term memory (LSTM) hidden units, have proven increasingly popular and powerful models for many natural language processing (NLP) problems. Accordingly, we made use of a bidirectional LSTM language model (LM) for our context-sensitive spelling detection and correction model which is shown to have much control over the correction process. While the use of LMs for spelling checking and correction is not new to this line of NLP research, our proposed approach makes better use of the rich neighbouring context, not only from before the word to be corrected, but also after it, via a dual-input deep LSTM network. Although in theory our proposed approach can be applied to any language, we carried out our experiments on Arabic, which we believe adds additional value given the fact that there are limited linguistic resources readily available in Arabic in comparison to many languages. Our experimental results demonstrate that the pro- posed methods are effective in both improving the quality of correction suggestions and minimising overcorrection.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Workshop
Refereed:Yes
Additional Information:Code: https://github.com/ymoslem/Arabisc/blob/main/README.md
Uncontrolled Keywords:Spelling Checking; Spelling Correction
Subjects:Computer Science > Computational linguistics
Computer Science > Computer engineering
Computer Science > Machine learning
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Official URL:https://www.aclweb.org/anthology/2020.nlptea-1.2/
Copyright Information:© 2020 The Authors. CC-BY-4.0
Funders:Science Foundation Ireland (SFI) Research Centres Programme (Grant No. 13/RC/2106), European Regional Development Fund, Research grants from SFI and Microsoft under Grant Numbers 13/RC/2077 and 18/CRT/6224
ID Code:25403
Deposited On:28 Jan 2021 14:11 by Thomas Murtagh . Last Modified 28 Jan 2021 14:11
Documents

Full text available as:

[thumbnail of Context_sensitive_Neural_Spelling_Checker_AACL_IJCNLP.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
234kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record