We design and evaluate several models for integrating Machine Translation (MT) output into a Translation Memory (TM) environment to facilitate the adoption of MT technology
in the localization industry.
We begin with the integration on the segment level via translation recommendation and translation reranking. Given an input to be translated, our translation recommendation
model compares the output from the MT and the TMsystems, and presents the better one to the post-editor. Our translation reranking model combines k-best lists from both systems,
and generates a new list according to estimated post-editing effort. We perform both automatic and human evaluation on these models. When measured against the consensus of
human judgement, the recommendation model obtains 0.91 precision at 0.93 recall, and the reranking model obtains 0.86 precision at 0.59 recall. The high precision of these models indicates that they can be integrated into TM environments without the risk of deteriorating the quality of the post-editing candidate, and can thereby preserve TM assets and established cost estimation methods associated with TMs.
We then explore methods for a deeper integration of translation memory and machine translation on the sub-segment level. We predict whether phrase pairs derived from fuzzy matches could be used to constrain the translation of an input segment. Using a series of novel linguistically-motivated features, our constraints lead both to more consistent translation output, and to improved translation quality, reflected by a 1.2 improvement in BLEU score and a 0.72 reduction in TER score, both of statistical significance (p < 0.01).
In sum, we present our work in three aspects: 1) translation recommendation and translation reranking models that can access high quality MT outputs in the TMenvironment, 2)
a sub-segment translation memory and machine translation integration model that improves both translation consistency and translation quality, and 3) a human evaluation pipeline to validate the effectiveness of our models with human judgements.