Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Linguistic analysis and automatic dependency parsing of Tweets in modern Irish

Cassidy, Lauren (2024) Linguistic analysis and automatic dependency parsing of Tweets in modern Irish. PhD thesis, Dublin City University.

Abstract
Automatic syntactic parsing of user-generated content in Modern Irish poses significant challenges due to the language’s minority status and limited linguistic resources. In this thesis, we present TwittIrish, the first Universal Dependencies treebank of tweets in Irish, a linguistically-informed, genre-specific dataset developed via a cycle of automatic syntactic annotation and manual correction. We use this novel resource to document and quantify the linguistic differences between Irish tweets and standardised Irish text with regard to orthography, morphology, lexicon, and syntax. We provide examples of linguistic features observed in the tweets and describe how we have chosen to represent them within the Universal Dependencies framework. Furthermore, utilise the TwittIrish dataset to estab- lish baseline parsing results and explore methods to increase parsing accuracy. We show that the use of monolingual Irish BERT embeddings provides a significant improvement over baseline results. Our error analysis shows that language contact phenomena consti- tute one of the greatest challenges associated with processing informal Irish text. We, therefore, extend our analysis of user-generated content to examine language contact in Irish-language tweets. Due to centuries of contact with English, code-switching, borrow- ing, and other language contact phenomena are frequent in informal Irish. We investigate the perceptions of Irish speakers with regard to language contact in the Irish-English language pair. Furthermore, we assess the advantages and disadvantages of distinguishing between code-switching and borrowing in the context of resource development for natural language processing. Our research contributes to language technology support for a low-resource language by providing a novel data set and facilitating more accurate de- pendency parsing of informal Irish. Additionally, the exploration of linguistic features of Irish-language tweets extends the impact of this research to linguistics, sociolinguistics, and the Irish-language community more broadly by enhancing the general understanding of the use of Irish on social media.
Metadata
Item Type:Thesis (PhD)
Date of Award:March 2024
Refereed:No
Supervisor(s):Foster, Jennifer and Lynn, Teresa
Uncontrolled Keywords:Irish Natural Language Processing, Dependency Parsing, Irish Social Media Analysis
Subjects:Computer Science > Artificial intelligence
Computer Science > Computational linguistics
Computer Science > Machine learning
Humanities > Irish language
Humanities > Linguistics
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Institutes and Centres > ADAPT
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 License. View License
Funders:Department of Tourism, Culture, Arts, Gaeltacht, Sports and Media
ID Code:29326
Deposited On:22 Mar 2024 13:32 by Jennifer Foster . Last Modified 22 Mar 2024 13:32
Documents

Full text available as:

[thumbnail of Lauren_Cassidy_PhD_thesis.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution-Noncommercial-No Derivative Works 4.0
2MB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record