Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Domain adaptation for machine translation with instance selection

Bicici, Ergun (2015) Domain adaptation for machine translation with instance selection. The Prague Bulletin of Mathematical Linguistics, 103 (1). pp. 5-20. ISSN 1804-0462

Abstract
Domain adaptation for machine translation (MT) can be achieved by selecting training instances close to the test set from a larger set of instances. We consider 7 different domain adaptation strategies and answer 7 research questions, which give us a recipe for domain adaptation in MT. We perform English to German statistical MT (SMT) experiments in a setting where test and training sentences can come from different corpora and one of our goals is to learn the parameters of the sampling process. Domain adaptation with training instance selection can obtain 22% increase in target 2-gram recall and can gain up to 3.55 BLEU points compared with random selection. Domain adaptation with feature decay algorithm (FDA) not only achieves the highest target 2-gram recall and BLEU performance but also perfectly learns the test sample distribution parameter with correlation 0.99. Moses SMT systems built with FDA selected 10K training sentences is able to obtain $F_1$ results as good as the baselines that use up to 2M sentences. Moses SMT systems built with FDA selected 50K training sentences is able to obtain 1 F1 point better results than the baselines.
Metadata
Item Type:Article (Published)
Refereed:Yes
Subjects:Computer Science > Computational linguistics
Computer Science > Machine translating
Computer Science > Information retrieval
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Institutes and Centres > ADAPT
Publisher:De Gruyter
Official URL:http://dx.doi.org/10.1515/pralin-2015-0001
Copyright Information:© 2015 De Gruyter
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:SFI, ADAPT CNGL Dublin City University
ID Code:20648
Deposited On:15 Jun 2015 10:10 by Mehmet Ergun Bicici . Last Modified 25 Oct 2018 09:26
Documents

Full text available as:

[thumbnail of DAforMTwithInstanceSelection_p1.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
32kB
Metrics

Altmetric Badge

Dimensions Badge

Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record