Translating low-resource languages by vocabulary adaptation
from close counterparts

Passban, Peyman; Lui, Qun; Way, Andy

Passban, Peyman, Lui, Qun ORCID: 0000-0002-7000-1792 and Way, Andy ORCID: 0000-0001-5736-5930 (2017) Translating low-resource languages by vocabulary adaptation from close counterparts. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 16 (4). pp. 1-14. ISSN 2375-4699

Abstract
Metadata
Downloads
Documents
Metrics

[+][-]

Abstract

Some natural languages belong to the same family or share similar syntactic and/or semantic regularities. This property persuades researchers to share computational models across languages and benefit from high-quality models to boost existing low-performance counterparts. In this article, we follow a similar idea, whereby we develop statistical and neural machine translation (MT) engines that are trained on one language pair but are used to translate another language. First we train a reliable model for a high resource language, and then we exploit cross-lingual similarities and adapt the model to work for a close language with almost zero resources. We chose Turkish (Tr) and Azeri or Azerbaijani (Az) as the proposed pair in our experiments. Azeri suffers from lack of resources as there is almost no bilingual corpus for this language. Via our techniques, we are able to train an engine for the Az→English (En) direction, which is able to outperform all other existing models.

Metadata

Item Type:	Article (Published)
Refereed:	Yes
Subjects:	Computer Science > Machine translating
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > ADAPT
Publisher:	ACM
Official URL:	http://dx.doi.org/10.1145/3099556
Copyright Information:	© 2017 ACM Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:	Science Foundation Ireland at ADAPT: Centre for Digital Content Platform Research (Grant 13/RC/2106).
ID Code:	23316
Deposited On:	15 May 2019 15:56 by INVALID USER. Last Modified 15 May 2019 15:56

Documents

Full text available as:

[thumbnail of Translating_Low-Resource_Languages_by_Vocabulary_Adaptation[1].pdf]

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
187kB

Metrics

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

Translating low-resource languages by vocabulary adaptation from close counterparts

Altmetric Badge

Dimensions Badge

Downloads