Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Supertagged phrase-based statistical machine translation

Hassan, Hany, Sima'an, Khalil and Way, Andy orcid logoORCID: 0000-0001-5736-5930 (2007) Supertagged phrase-based statistical machine translation. In: ACL 2007 - 45th Annual Meeting of the Association for Computational Linguistics, 25-27 June 2007, Prague, Czech Republic.

Until quite recently, extending Phrase-based Statistical Machine Translation (PBSMT) with syntactic structure caused system performance to deteriorate. In this work we show that incorporating lexical syntactic descriptions in the form of supertags can yield significantly better PBSMT systems. We describe a novel PBSMT model that integrates supertags into the target language model and the target side of the translation model. Two kinds of supertags are employed: those from Lexicalized Tree-Adjoining Grammar and Combinatory Categorial Grammar. Despite the differences between these two approaches, the supertaggers give similar improvements. In addition to supertagging, we also explore the utility of a surface global grammaticality measure based on combinatory operators. We perform various experiments on the Arabic to English NIST 2005 test set addressing issues such as sparseness, scalability and the utility of system subcomponents. Our best result (0.4688 BLEU) improves by 6.1% relative to a state-of-theart PBSMT model, which compares very favourably with the leading systems on the NIST 2005 task.
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Uncontrolled Keywords:phrase-based statistical machine translation;
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:Research Institutes and Centres > National Centre for Language Technology (NCLT)
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Publisher:Association for Computational Linguistics
Official URL:http://www.aclweb.org/anthology/P/P07/
Copyright Information:© 2007 Association for Computational Linguistics
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Science Foundation Ireland, SFI 05/IN/1732, Netherlands Organization for Scientific Research
ID Code:15218
Deposited On:18 Feb 2010 11:01 by DORAS Administrator . Last Modified 16 Nov 2018 09:52

Full text available as:

[thumbnail of HassanEtAl_acl_07.pdf]
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader


Downloads per month over past year

Archive Staff Only: edit this record