Browse DORAS
Browse Theses
Search
Latest Additions
Creative Commons License
Except where otherwise noted, content on this site is licensed for use under a:

Lexical syntax for statistical machine translation

Hassan, Hany (2009) Lexical syntax for statistical machine translation. PhD thesis, Dublin City University.

Full text available as:

[img]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
1268Kb

Abstract

Statistical Machine Translation (SMT) is by far the most dominant paradigm of Machine Translation. This can be justified by many reasons, such as accuracy, scalability, computational efficiency and fast adaptation to new languages and domains. However, current approaches of Phrase-based SMT lacks the capabilities of producing more grammatical translations and handling long-range reordering while maintaining the grammatical structure of the translation output. Recently, SMT researchers started to focus on extending Phrase-based SMT systems with syntactic knowledge; however, the previous techniques have limited capabilities due to introducing redundantly ambiguous syntactic structures and using decoders with limited language models, and with a high computational cost. In this thesis, we extend Phrase-based SMT with lexical syntactic descriptions that localize global syntactic information on the word without introducing syntactic redundant ambiguity. We presente a novel model of Phrase-based SMT which integrates linguistic lexical descriptions —supertags— into the target language model and the target side of the translation model. We conduct extensive experiments in two language pairs, Arabic– English and German–English, which show significant improvements over the state-ofthe- art Phrase-based SMT systems. Moreover, we introduce a novel Incremental Dependency-based Syntactic Language Model (IDLM) based on wide-coverage CCG incremental parsing which we integrate into a direct translation SMT system. Our proposed approach is the first to integrate full dependency parsing in SMT systems with a very attractive computational cost since it deploys the linear decoders widely used in Phrase–based SMT systems. The experimental results show a good improvement over a top-ranked state-of-the-art system.

Item Type:Thesis (PhD)
Date of Award:March 2009
Refereed:No
Supervisor(s):Way, Andy and Sima'an, Khalil
Subjects:Computer Science > Computational linguistics
Computer Science > Machine translating
Computer Science > Machine learning
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License
Funders:Science Foundation Ireland
ID Code:2320
Deposited On:02 Apr 2009 17:56 by Andrew Way. Last Modified 02 Apr 2009 17:56

Download statistics

Archive Staff Only: edit this record