TransBooster:black box optimisation of machine translation systems

Mellebeek, Bart (2007) TransBooster:black box optimisation of machine translation systems. PhD thesis, Dublin City University.

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

Machine Translation (MT) systems tend to underperform when faced with long, linguistically complex sentences. Rule-based systems often trade a broad but shallow linguistic coverage for a deep, fine-grained analysis since hand-crafting rules based on detailed linguistic analyses is time-consuming, error-prone and expensive. Most datadriven systems lack the necessary syntactic knowledge to effectively deal with non-local grammatical phenomena. Therefore, both rule-based and data-driven MT systems are better at handling short, simple sentences than linguistically complex ones. This thesis proposes a new and modular approach to help MT systems improve then output quality by reducing the number of complexities in the input. Instead of trying to reinvent the wheel by proposing yet another approach to MT, we build on the strengths of existing MT paradigms while trying to remedy their shortcomings as much as possible. We do this by developing TransBooster, a wrapper technology that reduces the complexity of the MT input by a recursive decomposition algorithm which produces simple input chunks that are spoon-fed to a baseline MT system TransBooster is not an MT system itself: it does not perform automatic translation, but operates on top of an existing MT system, gulding it through the input and trying to help the baseline system to improve the quality of its own translations through automatic complexity reduction. In this dissertation, we outline the motivation behind TransBooster, explain its development in depth and investigate its impact on the three most important paradigms in the field Rule-based, Example-based and Statistical MT. In addition, we use the Trans-Booster architecture as a promising alternative to current Multi-Engine MT techniques. We evaluate TransBooster on the language pair Engl~sh-+Spanish with a combination of automatic and manual evaluation metrics, prov~ding a rigorous analysis of the potential and shortcomings of our approach.

Metadata

Item Type:	Thesis (PhD)
Date of Award:	2007
Refereed:	No
Supervisor(s):	Way, Andy and van Genabith, Josef
Uncontrolled Keywords:	machine translation; tule based mt; statistical mt; example based mt
Subjects:	Computer Science > Machine translating
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License
ID Code:	16939
Deposited On:	04 May 2012 09:40 by Fran Callaghan . Last Modified 19 Jul 2018 14:55

Documents

Full text available as:

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
17MB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record