Banerjee, Pratyush (2013) Domain adaptation for statistical machine translation of corporate and user-generated content. PhD thesis, Dublin City University.
Abstract
The growing popularity of Statistical Machine Translation (SMT) techniques in recent years has led to the development of multiple domain-specic resources and adaptation scenarios. In this thesis we address two important and industrially relevant adaptation scenarios, each suited to different kinds of content.
Initially focussing on professionally edited `enterprise-quality' corporate content, we address a specic scenario of data translation from a mixture of different domains where, for each of them domain-specific data is available. We utilise an automatic classifier to combine multiple domain-specific models and empirically show that such a configuration results in better translation quality compared to both traditional and state-of-the-art techniques for handling mixed domain translation.
In the second phase of our research we shift our focus to the translation of possibly `noisy' user-generated content in web-forums created around products and services of a multinational company. Using professionally edited translation memory (TM) data for training, we use different normalisation and data selection techniques to adapt SMT models to noisy forum content. In this scenario, we also study the effect of mixture adaptation using a combination of in-domain and out-of-domain data at different component levels of an SMT system. Finally we focus on the task of optimal supplementary training data selection from out-of-domain corpora using a novel incremental model merging mechanism to adapt TM-based models to improve forum-content translation quality.
Metadata
Item Type: | Thesis (PhD) |
---|---|
Date of Award: | March 2013 |
Refereed: | No |
Supervisor(s): | Way, Andy, van Genabith, Josef and Roturier, Johann |
Uncontrolled Keywords: | Statistical Machine Translation; SMT |
Subjects: | Computer Science > Computational linguistics Computer Science > Machine translating Computer Science > Machine learning |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License |
ID Code: | 17722 |
Deposited On: | 03 Apr 2013 12:44 by Jennifer Foster . Last Modified 03 Apr 2013 12:44 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
2MB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record