Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Boosting neural POS tagger for Farsi using morphological information

Passban, Peyman, Liu, Qun orcid logoORCID: 0000-0002-7000-1792 and Way, Andy orcid logoORCID: 0000-0001-5736-5930 (2016) Boosting neural POS tagger for Farsi using morphological information. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 16 (1). ISSN 2375-4699

Abstract
Farsi (Persian) is a low-resource language that suffers from the data sparsity problem and a lack of efficient processing tools. Due to their broad application in natural language processing tasks, part-of-speech (POS) taggers are one of those important tools that should be considered in this respect. Despite recent work on Farsi tagging, there is still room for improvement. The best reported accuracy so far is 96%, which in special cases can rise to 96.9%. The main problem with existing taggers is their inefficiency in coping with outof-vocabulary (OOV) words. Addressing both problems of accuracy and OOV words, we developed a neural network-based POS tagger (NPT) that performs efficiently on Farsi. Despite using less data, NPT provides better results in comparison to state-of-the-art systems. Our proposed tagger performs with an accuracy of 97.4%, with performance highly influenced by morphological features. We carry out a shallow morphological analysis and show considerable improvement over the baseline configuration.
Metadata
Item Type:Article (Published)
Refereed:Yes
Uncontrolled Keywords:POS tagging; Farsi; morphological analysis
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Institutes and Centres > ADAPT
Publisher:Association for Computing Machinery
Official URL:http://dx.doi.org/10.1145/2934676
Copyright Information:© 2016 ACM
Funders:Science Foundation Ireland through the CNGL Programme (Grant 12/CE/I2267) in the ADAPT Centre (http://www.adaptcentre.ie) at Dublin City University
ID Code:23261
Deposited On:09 May 2019 08:31 by Thomas Murtagh . Last Modified 09 May 2019 08:31
Documents

Full text available as:

[thumbnail of Boosting_Neural_POS_Tagger_for_Farsi_Using_Morphological_Information[1].pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
429kB
Metrics

Altmetric Badge

Dimensions Badge

Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record