Comparing rule-based and data-driven approaches to Spanish-to-Basque machine translation
Labaka, Gorka and Stroppa, Nicolas and Way, Andy and Sarasola, Kepa (2007) Comparing rule-based and data-driven approaches to Spanish-to-Basque machine translation. In: Machine Translation Summit XI, 10-14 September, 2007, Copenhagen, Denmark.
Full text available as:
In this paper, we compare the rule-based and data-driven
approaches in the context of Spanish-to-Basque Machine Translation. The rule-based system we consider has been developed specifically for Spanish-to-Basque machine translation, and is tuned to this language pair. On the contrary, the data-driven system we use is generic, and has not been specifically designed to deal with Basque. Spanish-to-Basque Machine Translation is a challenge for data-driven
approaches for at least two reasons. First, there is lack of
bilingual data on which a data-driven MT system can be trained. Second, Basque is a morphologically-rich agglutinative language and translating to Basque requires a huge generation of morphological information, a difficult task for a generic system not specifically tuned to Basque. We present the results of a series of experiments, obtained on two different corpora, one being “in-domain” and the
other one “out-of-domain” with respect to the data-driven
system. We show that n-gram based automatic evaluation and edit-distance-based human evaluation yield two different sets of results. According to BLEU, the data-driven system outperforms the rule-based system on the in-domain data, while according to the human evaluation, the rule-based
approach achieves higher scores for both corpora.
Archive Staff Only: edit this record