Browse DORAS
Browse Theses
Search
Latest Additions
Creative Commons License
Except where otherwise noted, content on this site is licensed for use under a:

A discriminative latent variable-based "DE" classifier for Chinese–English SMT

Du, Jinhua and Way, Andy (2010) A discriminative latent variable-based "DE" classifier for Chinese–English SMT. In: COLING 2010 - 23rd International Conference on Computational Linguistics, 23-27 August 2010, Beijing, China .

Full text available as:

[img]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
267Kb

Abstract

Syntactic reordering on the source-side is an effective way of handling word order differences. The (DE) construction is a flexible and ubiquitous syntactic structure in Chinese which is a major source of error in translation quality. In this paper, we propose a new classifier model — discriminative latent variable model (DPLVM) — to classify the DE construction to improve the accuracy of the classification and hence the translation quality. We also propose a new feature which can automatically learn the reordering rules to a certain extent. The experimental results show that the MT systems using the data reordered by our proposed model outperform the baseline systems by 6.42% and 3.08% relative points in terms of the BLEU score on PB-SMT and hierarchical phrase-based MT respectively. In addition, we analyse the impact of DE annotation on word alignment and on the SMT phrase table.

Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:Research Initiatives and Centres > Centre for Next Generation Localisation (CNGL)
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Published in:Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010). . Association for Computational Linguistics.
Publisher:Association for Computational Linguistics
Official URL:http://www.aclweb.org/anthology/C/C10/C10-1033.pdf
Copyright Information:© 2010 Association for Computational Linguistics
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Science Foundation Ireland
ID Code:15798
Deposited On:10 Nov 2010 14:33 by Shane Harper. Last Modified 10 Nov 2010 14:33

Download statistics

Archive Staff Only: edit this record