Gap between theory and practice: noise sensitive word alignment in machine translation
Okita, Tsuyoshi, Graham, Yvette and Way, AndyORCID: 0000-0001-5736-5930
(2010)
Gap between theory and practice: noise sensitive word alignment in machine translation.
In: WAPA 2010 - First Workshop on Applications of Pattern Analysis, 1-3 September 2010, Windsor, UK.
Word alignment is to estimate a lexical translation probability p(e|f), or to estimate the correspondence g(e, f) where a function g outputs either 0 or 1, between a source word f and a target word e for given bilingual sentences. In practice, this formulation does not consider the existence of ‘noise’ (or outlier) which may cause problems depending on the corpus. N-to-m mapping objects, such as paraphrases, non-literal translations, and multiword
expressions, may appear as both noise and also as valid training data. From this perspective, this paper tries to answer the following two questions: 1) how to detect stable
patterns where noise seems legitimate, and 2) how to reduce such noise, where applicable, by supplying extra information as prior knowledge to a word aligner.