Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Comparing rule-based and data-driven approaches to Spanish-to-Basque machine translation

Labaka, Gorka, Stroppa, Nicolas, Way, Andy orcid logoORCID: 0000-0001-5736-5930 and Sarasola, Kepa (2007) Comparing rule-based and data-driven approaches to Spanish-to-Basque machine translation. In: Machine Translation Summit XI, 10-14 September, 2007, Copenhagen, Denmark.

Abstract
In this paper, we compare the rule-based and data-driven approaches in the context of Spanish-to-Basque Machine Translation. The rule-based system we consider has been developed specifically for Spanish-to-Basque machine translation, and is tuned to this language pair. On the contrary, the data-driven system we use is generic, and has not been specifically designed to deal with Basque. Spanish-to-Basque Machine Translation is a challenge for data-driven approaches for at least two reasons. First, there is lack of bilingual data on which a data-driven MT system can be trained. Second, Basque is a morphologically-rich agglutinative language and translating to Basque requires a huge generation of morphological information, a difficult task for a generic system not specifically tuned to Basque. We present the results of a series of experiments, obtained on two different corpora, one being “in-domain” and the other one “out-of-domain” with respect to the data-driven system. We show that n-gram based automatic evaluation and edit-distance-based human evaluation yield two different sets of results. According to BLEU, the data-driven system outperforms the rule-based system on the in-domain data, while according to the human evaluation, the rule-based approach achieves higher scores for both corpora.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:Spanish language; Basque language;
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:Research Institutes and Centres > National Centre for Language Technology (NCLT)
Publisher:European Association for Machine Translation
Official URL:http://www.mt-archive.info/MTS-2007-Labaka.pdf
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Science Foundation Ireland, SFI OS/IN/1732
ID Code:15228
Deposited On:18 Feb 2010 14:00 by DORAS Administrator . Last Modified 16 Nov 2018 09:50
Documents

Full text available as:

[thumbnail of LabakaEtAl_summit_07.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
566kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record