A crowd-sourcing approach for translations of minority
language user-generated content (UGC)
Dowling, MeghanORCID: 0000-0003-1637-4923, Lynn, Teresa and Way, AndyORCID: 0000-0001-5736-5930
(2017)
A crowd-sourcing approach for translations of minority
language user-generated content (UGC).
In: First workshop on Social Media and User Generated Content Machine Translation, 31 May 2017, Prague, Czech Republic.
Data sparsity is a common problem for machine translation of minority and less-resourced
languages. While data collection for standard, grammatical text can be challenging enough,
efforts for collection of parallel user-generated content can be even more challenging. In this
paper we describe an approach to collecting English↔Irish translations of user-generated content (tweets) that overcomes some of these hurdles. We show how a crowd-sourced data collection campaign, which was tailored to our target audience (the Irish language community),
proved successful in gathering data for a niche domain. We also discuss the reliability of crowd-sourcing English↔Irish tweet translations in terms of quality by reporting on a self-rating approach along with qualified reviewer ratings.
ADAPT Centre for Digital Content Technology, which is funded under the SFI Research Centres Programme (Grant 13/RC/2016) and is co-funded by the European Regional Development Fund
ID Code:
23304
Deposited On:
20 May 2019 15:50 by
Thomas Murtagh
. Last Modified 20 May 2019 15:50