Skip to main content
DORAS
DCU Online Research Access Service
Login (DCU Staff Only)
Translation crowdsourcing: creating a multilingual corpus of online educational content

Sosoni, Vilelmini ORCID: 0000-0002-9583-4651, Kermanidis, Katia Lida ORCID: 0000-0002-3270-5078, Stasimioti, Maria ORCID: 0000-0002-3270-5078, Naskos, Thanasis, Takoulidou, Eirini, van Zaanen, Menno, Castilho, Sheila ORCID: 0000-0002-8416-6555, Georgakopoulou, Panayota ORCID: 0000-0001-9780-1813, Kordoni, Valia and Egg, Markus (2018) Translation crowdsourcing: creating a multilingual corpus of online educational content. In: 11th International Conference on Language Resources and Evaluation, 7-12 May 2018, Miyazaki, Japan. ISBN 979-10-95546-00-9

Full text available as:

[img]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
242kB

Abstract

The present work describes a multilingual corpus of online content in the educational domain, i.e. Massive Open Online Course material, ranging from course forum text to subtitles of online video lectures, that has been developed via large-scale crowdsourcing. The English source text is manually translated into 11 European and BRIC languages using the CrowdFlower platform. During the process several challenges arose which mainly involved the in-domain text genre, the large text volume, the idiosyncrasies of each target language, the limitations of the crowdsourcing platform, as well as the quality assurance and workflow issues of the crowdsourcing process. The corpus constitutes a product of the EU-funded TraMOOC project and is utilised in the project in order to train, tune and test machine translation engines.

Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:parallel corpus; MOOCs; online educational text; crowdsourcing
Subjects:Social Sciences > Distance education
Social Sciences > Education
Social Sciences > Educational technology
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Initiatives and Centres > ADAPT
Published in: Proceedings of the 11th International Conference on Language Resources and Evaluation. . LREC. ISBN 979-10-95546-00-9
Publisher:LREC
Official URL:http://www.lrec-conf.org/proceedings/lrec2018/pdf/677.pdf
Copyright Information:© 2018 LREC. CC 4.0
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:TraMOOC project (Translation for Massive Open Online Courses), funded by the European Commission under H2020-ICT2014/H2020-ICT-2014-1 under grant agreement number 644333.
ID Code:23070
Deposited On:08 Mar 2019 14:13 by Thomas Murtagh . Last Modified 08 Mar 2019 14:13

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

Altmetric
- Altmetric
+ Altmetric
  • Student Email
  • Staff Email
  • Student Apps
  • Staff Apps
  • Loop
  • Disclaimer
  • Privacy
  • Contact Us