Statistical analysis of alignment characteristics for phrase-based machine translation
Lambert, Patrik, Petitrenaud, Simon, Ma, Yanjun and Way, AndyORCID: 0000-0001-5736-5930
(2010)
Statistical analysis of alignment characteristics for phrase-based machine translation.
In: EAMT 2010 - 14th Annual Conference of the European Association for Machine Translation, 27-28 May 2010, Saint-Raphaël, France.
In most statistical machine translation
(SMT) systems, bilingual segments are extracted
via word alignment. However,
there lacks systematic study as to what
alignment characteristics can benefit MT
under specific experimental settings such
as the language pair or the corpus size. In
this paper we produce a set of alignments
by directly tuning the alignment model according
to alignment F-score and BLEU
score in order to investigate the alignment
characteristics that are helpful in translation.
We report results for a phrasebased
SMT system on Chinese-to-English
IWSLT data, and Spanish-to-English European
Parliament data. With a statistical
analysis into alignment characteristics that
are correlated with BLEU score, we give
alignment hints to improve BLEU score
using a phrase-based SMT system and different
types of corpus.