He, Yifan (2011) The integration of machine translation and translation memory. PhD thesis, Dublin City University.
Abstract
We design and evaluate several models for integrating Machine Translation (MT) output into a Translation Memory (TM) environment to facilitate the adoption of MT technology
in the localization industry.
We begin with the integration on the segment level via translation recommendation and translation reranking. Given an input to be translated, our translation recommendation
model compares the output from the MT and the TMsystems, and presents the better one to the post-editor. Our translation reranking model combines k-best lists from both systems,
and generates a new list according to estimated post-editing effort. We perform both automatic and human evaluation on these models. When measured against the consensus of
human judgement, the recommendation model obtains 0.91 precision at 0.93 recall, and the reranking model obtains 0.86 precision at 0.59 recall. The high precision of these models indicates that they can be integrated into TM environments without the risk of deteriorating the quality of the post-editing candidate, and can thereby preserve TM assets and established cost estimation methods associated with TMs.
We then explore methods for a deeper integration of translation memory and machine translation on the sub-segment level. We predict whether phrase pairs derived from fuzzy matches could be used to constrain the translation of an input segment. Using a series of novel linguistically-motivated features, our constraints lead both to more consistent translation output, and to improved translation quality, reflected by a 1.2 improvement in BLEU score and a 0.72 reduction in TER score, both of statistical significance (p < 0.01).
In sum, we present our work in three aspects: 1) translation recommendation and translation reranking models that can access high quality MT outputs in the TMenvironment, 2)
a sub-segment translation memory and machine translation integration model that improves both translation consistency and translation quality, and 3) a human evaluation pipeline to validate the effectiveness of our models with human judgements.
Metadata
Item Type: | Thesis (PhD) |
---|---|
Date of Award: | November 2011 |
Refereed: | No |
Supervisor(s): | Way, Andy and van Genabith, Josef |
Uncontrolled Keywords: | Translation Memory |
Subjects: | Computer Science > Computational linguistics Computer Science > Machine translating |
DCU Faculties and Centres: | Research Institutes and Centres > Centre for Next Generation Localisation (CNGL) DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License |
Funders: | Science Foundation Ireland |
ID Code: | 16426 |
Deposited On: | 02 Dec 2011 11:34 by Andrew Way . Last Modified 19 Jul 2018 14:53 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
1MB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record