Lexical syntax for statistical machine translation
Hassan, Hany (2009) Lexical syntax for statistical machine translation. PhD thesis, Dublin City University.
Full text available as:
Statistical Machine Translation (SMT) is by far the most dominant paradigm of Machine
Translation. This can be justified by many reasons, such as accuracy, scalability, computational
efficiency and fast adaptation to new languages and domains. However, current
approaches of Phrase-based SMT lacks the capabilities of producing more grammatical
translations and handling long-range reordering while maintaining the grammatical structure
of the translation output. Recently, SMT researchers started to focus on extending
Phrase-based SMT systems with syntactic knowledge; however, the previous techniques
have limited capabilities due to introducing redundantly ambiguous syntactic structures
and using decoders with limited language models, and with a high computational cost.
In this thesis, we extend Phrase-based SMT with lexical syntactic descriptions that
localize global syntactic information on the word without introducing syntactic redundant
ambiguity. We presente a novel model of Phrase-based SMT which integrates linguistic
lexical descriptions —supertags— into the target language model and the target side of
the translation model. We conduct extensive experiments in two language pairs, Arabic–
English and German–English, which show significant improvements over the state-ofthe-
art Phrase-based SMT systems.
Moreover, we introduce a novel Incremental Dependency-based Syntactic Language
Model (IDLM) based on wide-coverage CCG incremental parsing which we integrate
into a direct translation SMT system. Our proposed approach is the first to integrate
full dependency parsing in SMT systems with a very attractive computational cost since it
deploys the linear decoders widely used in Phrase–based SMT systems. The experimental
results show a good improvement over a top-ranked state-of-the-art system.
Archive Staff Only: edit this record