McCahill, Leona, Baltazar, Thomas, Bruen, Sally, Xu, Liang
ORCID: 0000-0002-2619-1883, Ward, Monica
ORCID: 0000-0001-7327-1395, Uí Dhonnchadha, Elaine
ORCID: 0000-0003-3448-4288 and Foster, Jennifer
ORCID: 0000-0002-7789-4853
(2024)
Exploring text classification for enhancing digital game-based language learning for Irish.
In: The 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages@ LREC-COLING 2024, 21-22 May 2024, Turin, Italy.
Abstract
Digital game-based language learning (DGBLL) can help with the language learning process. DGBLL applications can make learning more enjoyable and engaging, but they are difficult to develop. A DBGLL app that relies on target language texts obviously needs to be able to use texts of the appropriate level for the individual learners. This implies that text classification tools should be available to DGBLL developers, who may not be familiar with the target language, in order to incorporate suitable texts into their games. While text difficulty classifiers exist for many of the most commonly spoken languages, this is not the case for under-resourced languages, such as Irish. In this paper, we explore approaches to the development of text classifiers for Irish. In the first approach to text analysis and grading, we apply linguistic analysis to assess text complexity. Features from this approach are then used in machine learning-based text classification, which explores the application of a number of machine learning algorithms to the problem. Although the development of these text classifiers is at an early stage, they show promise, particularly in a low-resourced scenario.
Metadata
| Item Type: | Conference or Workshop Item (Paper) |
|---|---|
| Event Type: | Workshop |
| Refereed: | Yes |
| Uncontrolled Keywords: | Text classification, under-resourced language, digital game-based language learning, machine learning |
| Subjects: | Computer Science > Artificial intelligence Computer Science > Computational linguistics Computer Science > Interactive computer systems Computer Science > Machine learning Humanities > Irish language Humanities > Language Social Sciences > Educational technology |
| DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > ADAPT Research Institutes and Centres > d-real |
| Published in: | Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages@ LREC-COLING 2024. . ELRA Language Resource Association. |
| Publisher: | ELRA Language Resource Association |
| Official URL: | https://aclanthology.org/2024.sigul-1.12.pdf |
| Funders: | Science Foundation Ireland |
| ID Code: | 31994 |
| Deposited On: | 11 Dec 2025 13:38 by Liang Xu . Last Modified 11 Dec 2025 13:38 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution 4.0 852kB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record