Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Experiments on domain adaptation for English-Hindi SMT

Haque, Rejwanul orcid logoORCID: 0000-0003-1680-0099, Naskar, Sudip Kumar, van Genabith, Josef orcid logoORCID: 0000-0003-1322-7944 and Way, Andy orcid logoORCID: 0000-0001-5736-5930 (2009) Experiments on domain adaptation for English-Hindi SMT. In: PACLIC 23 - the 23rd Pacific Asia Conference on Language, Information and Computation, 3-5 December 2009, Hong Kong.

Abstract
Statistical Machine Translation (SMT) systems are usually trained on large amounts of bilingual text and monolingual target language text. If a significant amount of out-of-domain data is added to the training data, the quality of translation can drop. On the other hand, training an SMT system on a small amount of training material for given indomain data leads to narrow lexical coverage which again results in a low translation quality. In this paper, (i) we explore domain-adaptation techniques to combine large out-of-domain training data with small-scale in-domain training data for English—Hindi statistical machine translation and (ii) we cluster large out-of-domain training data to extract sentences similar to in-domain sentences and apply adaptation techniques to combine clustered sub-corpora with in-domain training data into a unified framework, achieving a 0.44 absolute corresponding to a 4.03% relative improvement in terms of BLEU over the baseline.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:statistical machine translation; domain adaptation;
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:Research Institutes and Centres > Centre for Next Generation Localisation (CNGL)
Research Institutes and Centres > National Centre for Language Technology (NCLT)
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Official URL:http://paclic23.ctl.cityu.edu.hk/PACLIC23_index.ht...
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
ID Code:15175
Deposited On:15 Feb 2010 14:52 by DORAS Administrator . Last Modified 21 Jan 2022 16:31
Documents

Full text available as:

[thumbnail of HaqueEtAl_paclic_09a.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
152kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record