Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

TwittIrish: a universal dependencies treebank of Tweets in modern Irish

Cassidy, Lauren, Lynn, Teresa, Barry, James and Foster, Jennifer orcid logoORCID: 0000-0002-7789-4853 (2022) TwittIrish: a universal dependencies treebank of Tweets in modern Irish. In: 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 22-27 May 2022, Dublin, Ireland.

Modern Irish is a minority language lacking sufficient computational resources for the task of accurate automatic syntactic parsing of user-generated content such as tweets. Although language technology for the Irish language has been developing in recent years, these tools tend to perform poorly on user-generated content. As with other languages, the linguistic style observed in Irish tweets differs, in terms of orthography, lexicon, and syntax, from that of standard texts more commonly used for the development of language models and parsers. We release the first Universal Dependencies treebank of Irish tweets, facilitating natural language processing of user-generated content in Irish. In this paper, we explore the differences between Irish tweets and standard Irish text, and the challenges associated with dependency parsing of Irish tweets. We describe our bootstrapping method of treebank development and report on preliminary parsing experiments.
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Subjects:Computer Science > Artificial intelligence
Computer Science > Computational linguistics
Computer Science > Machine learning
Humanities > Irish language
Humanities > Language
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Institutes and Centres > ADAPT
Published in: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 1. Association for Computational Linguistics (ACL).
Publisher:Association for Computational Linguistics (ACL)
Official URL:https://doi.org/10.18653/v1/2022.acl-long.473
Copyright Information:© 2022 Association for Computational Linguistics
Funders:Irish Government Department of Tourism, Culture, Arts, Gaeltacht, Sport and Media under the GaelTech Projec, Science Foundation Ireland in the ADAPT Centre (Grant No. 13/RC/2106) at Dublin City University.
ID Code:29142
Deposited On:19 Oct 2023 11:29 by Jennifer Foster . Last Modified 19 Oct 2023 13:23

Full text available as:

[thumbnail of 2022.acl-long.473.pdf]
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution-Noncommercial 4.0


Downloads per month over past year

Archive Staff Only: edit this record