Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

An analyser and generator for Irish inflectional morphology using finite-state transducers

Uí Dhonnchadha, Elaine (2002) An analyser and generator for Irish inflectional morphology using finite-state transducers. Master of Science thesis, Dublin City University.

Abstract
Computational morphology is an important step in natural language processing. Finite-state techniques have been applied successfully in computational phonology and morphology to many of the world’s major languages. Celtic languages, such as Modern Irish, present unique and challenging morphological features that to date have not been addressed using finite-state technology. This thesis presents a finite-state morphology of Irish developed using Xerox Finite-State Tools. To the best of our knowledge, such a resource does not exist. The computational model, implemented as a finite-state transducer, encodes the inflectional morphology of nouns, adjectives, and verbs. Other parts of speech are also included in the interests of language coverage. The implementation is a strictly lexicalised design: the morphotactics of stems and affixes are encoded in the lexicon using replace rule triggers. Word mutations are then implemented as a series of replace rules written as regular expressions. Both components are compiled into finite state transducers and then combined, to produce a single two-level morphological transducer for the language. A major advantage of finite-state implementations of morphology is their inherent bi-directionality; the same system is used for both analysis and generation of word forms in the language. This resource can be used as a component part in parsing and generation in natural language processing (NLP) applications, such as spelling checkers/correctors, stemmers and text to speech synthesisers. It can also be used for tokenising text, lemmatising, and as an input to automatic partof- speech tagging of a corpus. The system is designed for broad coverage of the language and this is evaluated by comparing it with a list of the 1000 most frequently found word forms in a corpus of contemporary Irish texts. Finally, maintainability of the system is discussed and possible extensions to the system are suggested, such as derivational morphology and the inclusion of dialectal or historical word-forms.
Metadata
Item Type:Thesis (Master of Science)
Date of Award:2002
Refereed:No
Supervisor(s):van Genabith, Josef and Nic Pháidín, Caoilfhionn
Uncontrolled Keywords:Inflection; Morphology; Natural language processing
Subjects:Computer Science > Computational linguistics
Humanities > Irish language
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License
ID Code:18253
Deposited On:27 May 2013 13:32 by Celine Campbell . Last Modified 03 Nov 2023 10:40
Documents

Full text available as:

[thumbnail of Elaine_Uì_Dhonnchadha.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
4MB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record