Mitigating the problems of SMT using EBMT

Dandapat, Sandipan

Dandapat, Sandipan (2012) Mitigating the problems of SMT using EBMT. PhD thesis, Dublin City University.

Abstract
Metadata
Downloads
Documents

Abstract

Statistical Machine Translation (SMT) typically has difficulties with less-resourced languages even with homogeneous data. In this thesis we address the application of Example-Based Machine Translation (EBMT) methods to overcome some of these difficulties. We adopt three alternative approaches to tackle these problems focusing on two poorly-resourced translation tasks (English–Bangla and English–Turkish). First, we adopt a runtime approach to EBMT using proportional analogy. In addition to the translation task, we have tested the EBMT system using proportional analogy for named entity transliteration. In the second attempt, we use a compiled approach to EBMT. Finally, we present a novel way of integrating Translation Memory (TM) into an EBMT system. We discuss the development of these three different EBMT systems and the experiments we have performed. In addition, we present an approach to augment the output quality by strategically combining EBMT systems and SMT systems. The hybrid system shows significant improvement for different language pairs. Runtime EBMT systems in general have significant time complexity issues especially for large example-base. We explore two methods to address this issue in our system by making the system scalable at runtime for a large example-base (English–French). First, we use a heuristic-based approach. Secondly we use an IR-based indexing technique to speed up the time-consuming matching procedure of the EBMT system. The index-based matching procedure substantially improves run-time speed without affecting translation quality.

Metadata

Item Type:	Thesis (PhD)
Date of Award:	November 2012
Refereed:	No
Supervisor(s):	Way, Andy and Morrissey, Sara
Uncontrolled Keywords:	Statistical Machine Translation; SMT; Example-Based Machine Translation; EBMT
Subjects:	Computer Science > Machine translating Computer Science > Computational linguistics Computer Science > Machine learning
DCU Faculties and Centres:	Research Institutes and Centres > Centre for Next Generation Localisation (CNGL) Research Institutes and Centres > National Centre for Language Technology (NCLT) DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License
Funders:	SFI
ID Code:	17190
Deposited On:	15 Nov 2012 11:13 by Andrew Way . Last Modified 19 Jul 2018 14:56

Documents

Full text available as:

[thumbnail of SandipanPhDFinalThesis.pdf]

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
1MB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

Mitigating the problems of SMT using EBMT

Downloads