Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Preparing an endangered language for the digital age: the case of Judeo-Spanish

Öktem, Alp orcid logoORCID: 0000-0002-0700-1159, Zevallos, Rodolfo, Moslem, Yasmin orcid logoORCID: 0000-0003-4595-6877, Öztürk, Güneş and Şarhon, Karen Gerson (2022) Preparing an endangered language for the digital age: the case of Judeo-Spanish. In: Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia within the 13th Language Resources and Evaluation Conference, 20 June 2022, Marseille, France.

Abstract
We develop machine translation and speech synthesis systems to complement the efforts of revitalizing Judeo-Spanish, the exiled language of Sephardic Jews, which survived for centuries, but now faces the threat of extinction in the digital age. Building on resources created by the Sephardic community of Turkey and elsewhere, we create corpora and tools that would help preserve this language for future generations. For machine translation, we first develop a Spanish to Judeo-Spanish rule-based machine translation system, in order to generate large volumes of synthetic parallel data in the relevant language pairs: Turkish, English and Spanish. Then, we train baseline neural machine translation engines using this synthetic data and authentic parallel data created from translations by the Sephardic community. For text-to-speech synthesis, we present a 3.5 hour single speaker speech corpus for building a neural speech synthesis engine. Resources, model weights and online inference engines are shared publicly.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:Extremely low-resource language; Data-augmentation, Text-to-Speech; Judeo-Spanish
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Institutes and Centres > ADAPT
Published in: Proceedings of the Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia within the 13th Language Resources and Evaluation Conference. . European Language Resources Association (ELRA).
Publisher:European Language Resources Association (ELRA)
Official URL:https://aclanthology.org/2022.eurali-1.18
Copyright Information:© European Language Resources Association (ELRA)
Funders:European Union via “Grant Scheme for Common Cultural Heritage: Preservation and Dialogue between Turkey and the EU (CCH-II), Science Foundation Ireland Centre for Research Training in Digitally-Enhanced Reality (d-real) under Grant No. 18/CRT/6224, Science Foundation Ireland (SFI) Research Centres Programme (Grant No. 13/RC/2106), European Regional Development Fund
ID Code:28325
Deposited On:11 May 2023 12:11 by Thomas Murtagh . Last Modified 11 May 2023 12:11
Documents

Full text available as:

[thumbnail of f8a30620-cc4a-47b0-baa0-dd510dfa6c74.tmp] PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution-Noncommercial-No Derivative Works 4.0
324kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record