Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Integrating source-language context into log-linear models of statistical machine translation

Haque, Rejwanul orcid logoORCID: 0000-0003-1680-0099 (2011) Integrating source-language context into log-linear models of statistical machine translation. PhD thesis, Dublin City University.

Abstract
The translation features typically used in state-of-the-art statistical machine translation (SMT) model dependencies between the source and target phrases, but not among the phrases in the source language themselves. A swathe of research has demonstrated that integrating source context modelling directly into log-linear phrase-based SMT (PB-SMT) and hierarchical PB-SMT (HPB-SMT), and can positively influence the weighting and selection of target phrases, and thus improve translation quality. In this thesis we present novel approaches to incorporate source-language contextual modelling into the state-of-the-art SMT models in order to enhance the quality of lexical selection. We investigate the effectiveness of use of a range of contextual features, including lexical features of neighbouring words, part-of-speech tags, supertags, sentence-similarity features, dependency information, and semantic roles. We explored a series of language pairs featuring typologically different languages, and examined the scalability of our research to larger amounts of training data. While our results are mixed across feature selections, language pairs, and learning curves, we observe that including contextual features of the source sentence in general produces improvements. The most significant improvements involve the integration of long-distance contextual features, such as dependency relations in combination with part-of-speech tags in Dutch-to-English subtitle translation, the combination of dependency parse and semantic role information in English-to-Dutch parliamentary debate translation, supertag features in English-to-Chinese translation, or combination of supertag and lexical features in English-to-Dutch subtitle translation. Furthermore, we investigate the applicability of our lexical contextual model in another closely related NLP problem, namely machine transliteration.
Metadata
Item Type:Thesis (PhD)
Date of Award:November 2011
Refereed:No
Supervisor(s):Way, Andy
Uncontrolled Keywords:source-language; context
Subjects:Computer Science > Machine translating
Computer Science > Computational linguistics
Computer Science > Machine learning
DCU Faculties and Centres:Research Institutes and Centres > Centre for Next Generation Localisation (CNGL)
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License
Funders:Science Foundation Ireland
ID Code:16458
Deposited On:02 Dec 2011 11:32 by Andrew Way . Last Modified 13 Aug 2020 16:03
Documents

Full text available as:

[thumbnail of RejwanulPhdThesis.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
1MB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record