Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

The automatic processing of multiword expressions in Irish

Walsh, Abigail (2023) The automatic processing of multiword expressions in Irish. PhD thesis, Dublin City University.

Abstract
It is well-documented that Multiword Expressions (MWEs) pose a unique challenge to a variety of NLP tasks such as machine translation, parsing, information retrieval, and more. For low-resource languages such as Irish, these challenges can be exacerbated by the scarcity of data, and a lack of research in this topic. In order to improve handling of MWEs in various NLP tasks for Irish, this thesis will address both the lack of resources specifically targeting MWEs in Irish, and examine how these resources can be applied to said NLP tasks. We report on the creation and analysis of a number of lexical resources as part of this PhD research. Ilfhocail, a lexicon of Irish MWEs, is created through extract- ing MWEs from other lexical resources such as dictionaries. A corpus annotated with verbal MWEs in Irish is created for the inclusion of Irish in the PARSEME Shared Task 1.2. Additionally, MWEs were tagged in a bilingual EN-GA corpus for inclusion in experiments in machine translation. For the purposes of annotation, a categorisation scheme for nine categories of MWEs in Irish is created, based on combining linguistic analysis on these types of constructions and cross-lingual frameworks for defining MWEs. A case study in applying MWEs to NLP tasks is undertaken, with the exploration of incorporating MWE information while training Neural Machine Translation systems. Finally, the topic of automatic identification of Irish MWEs is explored, documenting the training of a system capable of automatically identifying Irish MWEs from a variety of categories, and the challenges associated with developing such a system. This research contributes towards a greater understanding of Irish MWEs and their applications in NLP, and provides a foundation for future work in exploring other methods for the automatic discovery and identification of Irish MWEs, and further developing the MWE resources described above.
Metadata
Item Type:Thesis (PhD)
Date of Award:March 2023
Refereed:No
Supervisor(s):Foster, Jennifer and Lynn, Teresa
Uncontrolled Keywords:Natural Language Processing; Multiword Expressions; Irish Language; Technology
Subjects:Computer Science > Artificial intelligence
Computer Science > Computational linguistics
Computer Science > Machine learning
Computer Science > Machine translating
Humanities > Irish language
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Institutes and Centres > ADAPT
Funders:Department of Tourism, Culture, Arts, Gaeltacht, Sport and Media
ID Code:27997
Deposited On:31 Mar 2023 09:17 by Jennifer Foster . Last Modified 31 Mar 2023 09:17
Documents

Full text available as:

[thumbnail of Abigail_Thesis_hardbound_submission_signed.pdf] PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution-Noncommercial-No Derivative Works 4.0
3MB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record