Translating low-resource languages by vocabulary adaptation
from close counterparts
Passban, Peyman, Lui, QunORCID: 0000-0002-7000-1792 and Way, AndyORCID: 0000-0001-5736-5930
(2017)
Translating low-resource languages by vocabulary adaptation
from close counterparts.
ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 16
(4).
pp. 1-14.
ISSN 2375-4699
Some natural languages belong to the same family or share similar syntactic and/or semantic regularities.
This property persuades researchers to share computational models across languages and benefit from
high-quality models to boost existing low-performance counterparts. In this article, we follow a similar
idea, whereby we develop statistical and neural machine translation (MT) engines that are trained on
one language pair but are used to translate another language. First we train a reliable model for a high resource language, and then we exploit cross-lingual similarities and adapt the model to work for a close
language with almost zero resources. We chose Turkish (Tr) and Azeri or Azerbaijani (Az) as the proposed
pair in our experiments. Azeri suffers from lack of resources as there is almost no bilingual corpus for this
language. Via our techniques, we are able to train an engine for the Az→English (En) direction, which is
able to outperform all other existing models.