Browse DORAS
Browse Theses
Search
Latest Additions
Creative Commons License
Except where otherwise noted, content on this site is licensed for use under a:

Optimal bilingual data for French-English PB-SMT

Ozdowska, Sylwia and Way, Andy (2009) Optimal bilingual data for French-English PB-SMT. In: EAMT 2009 - 13th Annual Conference of the European Association for Machine Translation, 13-15 May 2009, Barcelona, Spain.

Full text available as:

[img]PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
223Kb

Abstract

We investigate the impact of the original source language (SL) on French–English PB-SMT. We train four configurations of a state-of-the-art PB-SMT system based on French–English parallel corpora which differ in terms of the original SL, and conduct experiments in both translation directions. We see that data containing original French and English translated from French is optimal when building a system translating from French into English. Conversely, using data comprising exclusively French and English translated from several other languages is suboptimal regardless of the translation direction. Accordingly, the clamour for more data needs to be tempered somewhat; unless the quality of such data is controlled, more training data can cause translation performance to decrease drastically, by up to 38% relative BLEU in our experiments.

Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:phrase-based statistical machine translation;
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:Research Initiatives and Centres > National Centre for Language Technology (NCLT)
Published in:Proceedings of the 13th Annual Conference of the EAMT. . European Association for Machine Translation.
Publisher:European Association for Machine Translation
Official URL:http://www.talp.cat/eamt09/index.php/programme
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Science Foundation Ireland, SFI 05/IN/1732
ID Code:15157
Deposited On:15 Feb 2010 11:22 by DORAS Administrator. Last Modified 27 Apr 2010 11:46

Download statistics

Archive Staff Only: edit this record