Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Handwritten Text Recognition (HTR) for Irish-Language Folklore

Ó Raghallaigh, Brian orcid logoORCID: 0000-0003-3813-1949, Palandri, Andrea and MacCárthaigh, Críostóir (2022) Handwritten Text Recognition (HTR) for Irish-Language Folklore. In: 4th Celtic Language Technology Workshop within LREC2022, 20-25 June, 2022, Marseilles, France.

Abstract
In this paper we present our method for digitising a large collection of handwritten Irish-language texts as part of a project to mine information from a large corpus of Irish and Scottish Gaelic folktales. The handwritten texts form part of the Main Manuscript Collection of the National Folklore Collection of Ireland and contain handwritten transcriptions of oral folklore collected in Ireland in the 20th century. With the goal of creating a large text corpus of the Irish-language folktales contained within this collection, our method involves scanning the pages of the physical volumes and digitising the text on these pages using Transkribus, a platform for the recognition of historical documents. Given the nature of the collection, the approach we have taken involves the creation of individual text recognition models for multiple collectors' hands. Doing it this way was motivated by the fact that a relatively small number of collectors contributed the bulk of the material, while the differences between each collector in terms of style, layout and orthography were difficult to reconcile within a single handwriting model. We present our preliminary results along with a discussion on the viability of using crowdsourced correction to improve our HTR models.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:Digital folkloristics, handwritten text recognition, Irish language
Subjects:Humanities > Irish language
Humanities > Language
Humanities > Linguistics
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Humanities and Social Science
DCU Faculties and Schools > Faculty of Humanities and Social Science > Fiontar agus Scoil na Gaeilge
Published in: Fransen, Theodorus, Lamb, William and Prys, Delyth, (eds.) Proceedings of the 4th Celtic Language Technology Workshop within LREC2022. . European Language Resources Association.
Publisher:European Language Resources Association
Official URL:https://aclanthology.org/2022.cltw-1.17/
Copyright Information:Authors
ID Code:32181
Deposited On:19 Jan 2026 11:15 by Andrea Palandri . Last Modified 19 Jan 2026 11:15
Documents

Full text available as:

[thumbnail of 2022.cltw-1.17.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution-Noncommercial 4.0
610kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record