Lexical syntax for statistical machine translation

Hassan, Hany

Hassan, Hany (2009) Lexical syntax for statistical machine translation. PhD thesis, Dublin City University.

Abstract
Metadata
Downloads
Documents

Abstract

Statistical Machine Translation (SMT) is by far the most dominant paradigm of Machine Translation. This can be justified by many reasons, such as accuracy, scalability, computational efficiency and fast adaptation to new languages and domains. However, current approaches of Phrase-based SMT lacks the capabilities of producing more grammatical translations and handling long-range reordering while maintaining the grammatical structure of the translation output. Recently, SMT researchers started to focus on extending Phrase-based SMT systems with syntactic knowledge; however, the previous techniques have limited capabilities due to introducing redundantly ambiguous syntactic structures and using decoders with limited language models, and with a high computational cost. In this thesis, we extend Phrase-based SMT with lexical syntactic descriptions that localize global syntactic information on the word without introducing syntactic redundant ambiguity. We presente a novel model of Phrase-based SMT which integrates linguistic lexical descriptions —supertags— into the target language model and the target side of the translation model. We conduct extensive experiments in two language pairs, Arabic– English and German–English, which show significant improvements over the state-ofthe- art Phrase-based SMT systems. Moreover, we introduce a novel Incremental Dependency-based Syntactic Language Model (IDLM) based on wide-coverage CCG incremental parsing which we integrate into a direct translation SMT system. Our proposed approach is the first to integrate full dependency parsing in SMT systems with a very attractive computational cost since it deploys the linear decoders widely used in Phrase–based SMT systems. The experimental results show a good improvement over a top-ranked state-of-the-art system.

Metadata

Item Type:	Thesis (PhD)
Date of Award:	March 2009
Refereed:	No
Supervisor(s):	Way, Andy and Sima'an, Khalil
Subjects:	Computer Science > Computational linguistics Computer Science > Machine translating Computer Science > Machine learning
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License
Funders:	Science Foundation Ireland
ID Code:	2320
Deposited On:	02 Apr 2009 16:56 by Andrew Way . Last Modified 19 Jul 2018 14:43

Documents

Full text available as:

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
1MB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

Lexical syntax for statistical machine translation

Downloads