Browse DORAS
Browse Theses
Search
Latest Additions
Creative Commons License
Except where otherwise noted, content on this site is licensed for use under a:

Combining multi-domain statistical machine translation models using automatic classifiers

Banerjee, Pratyush and Du, Jinhua and Li, Baoli and Kumar Naskar, Sudip and Way, Andy and van Genabith, Josef (2010) Combining multi-domain statistical machine translation models using automatic classifiers. In: AMTA 2010 - 9th Conference of the Association for Machine Translation in the Americas, 31 October - 4 November 2010, Denver, CO, USA.

Full text available as:

[img]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
317Kb

Abstract

This paper presents a set of experiments on Domain Adaptation of Statistical Machine Translation systems. The experiments focus on Chinese-English and two domain-specific corpora. The paper presents a novel approach for combining multiple domain-trained translation models to achieve improved translation quality for both domain-specific as well as combined sets of sentences. We train a statistical classifier to classify sentences according to the appropriate domain and utilize the corresponding domain-specific MT models to translate them. Experimental results show that the method achieves a statistically significant absolute improvement of 1.58 BLEU (2.86% relative improvement) score over a translation model trained on combined data, and considerable improvements over a model using multiple decoding paths of the Moses decoder, for the combined domain test set. Furthermore, even for domain-specific test sets, our approach works almost as well as dedicated domain-specific models and perfect classification.

Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:Research Initiatives and Centres > Centre for Next Generation Localisation (CNGL)
Publisher:Association for Machine Translation in the Americas
Official URL:http://amta2010.amtaweb.org/AMTA/html/toc.htm
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Science Foundation Ireland
ID Code:15804
Deposited On:06 Dec 2010 14:04 by Shane Harper. Last Modified 25 May 2011 12:17

Download statistics

Archive Staff Only: edit this record