Robust large-scale EBMT with marker-based segmentation
Gough, Nano and Way, AndyORCID: 0000-0001-5736-5930
(2004)
Robust large-scale EBMT with marker-based segmentation.
In: TMI 2004 - 10th International Conference on Theoretical and Methodological Issues in Machine Translation, 4-6 October 2004, Baltimore, Maryland, USA.
Previous work on marker-based EBMT [Gough & Way, 2003, Way & Gough, 2004] suffered from problems such as data-sparseness and disparity between the training and test data. We have developed a large-scale robust EBMT system. In a comparison with the systems listed in [Somers, 2003], ours is the third largest EBMT system and certainly the largest English-French EBMT system. Previous work used the on-line MT system Logomedia to translate source language material as a means of populating the system’s database where bitexts were unavailable. We derive our sententially aligned strings from a Sun Translation Memory (TM) and limit the integration of Logomedia to the derivation of our word-level lexicon. We also use Logomedia to provide a baseline comparison for our system and observe that we
outperform Logomedia and previous marker-based EBMT systems in a number of tests.