Ozdowska, Sylwia and Way, AndyORCID: 0000-0001-5736-5930
(2009)
Optimal bilingual data for French-English PB-SMT.
In: EAMT 2009 - 13th Annual Conference of the European Association for Machine Translation, 13-15 May 2009, Barcelona, Spain.
We investigate the impact of the original source language (SL) on French–English PB-SMT. We train four configurations of a state-of-the-art PB-SMT system based on French–English parallel corpora which differ in terms of the original SL, and conduct experiments in both translation directions.
We see that data containing original French and English translated from French is optimal when building a system
translating from French into English. Conversely, using data comprising exclusively French and English translated from several other languages is suboptimal regardless of
the translation direction. Accordingly, the clamour for more data needs to be tempered somewhat; unless the quality of such data is controlled, more training data can cause translation performance to decrease drastically, by up to 38% relative BLEU in our experiments.