Hybridity in MT: experiments on the Europarl corpus
Groves, Declan and Way, AndyORCID: 0000-0001-5736-5930
(2006)
Hybridity in MT: experiments on the Europarl corpus.
In: EAMT 2006 - 11th Annual conference of the European Association for Machine Translation, 19-20 June 2006, Oslo, Norway.
(Way & Gough, 2005) demonstrate that their Marker-based EBMT system is capable of outperforming a word-based
SMT system trained on reasonably large data sets. (Groves & Way, 2005) take this a stage further and demonstrate that
while the EBMT system also outperforms a phrase-based SMT (PBSMT) system, a hybrid 'example-based SMT' system incorporating marker chunks and SMT sub-sentential alignments is capable of outperforming both baseline translation models for French{English translation.
In this paper, we show that similar gains are to be had from constructing a hybrid 'statistical EBMT' system capable
of outperforming the baseline system of (Way & Gough, 2005). Using the Europarl (Koehn, 2005) training and test
sets we show that this time around, although all 'hybrid' variants of the EBMT system fall short of the quality achieved by the baseline PBSMT system, merging
elements of the marker-based and SMT data, as in (Groves & Way, 2005), to create a hybrid 'example-based SMT' system, outperforms the baseline SMT and EBMT systems from which it is derived.
Furthermore, we provide further evidence in favour of hybrid systems by adding an SMT target language model to all EBMT system variants and demonstrate that this too has a positive e®ect on translation quality.