Browse DORAS
Browse Theses
Search
Latest Additions
Creative Commons License
Except where otherwise noted, content on this site is licensed for use under a:

TransBooster:black box optimisation of machine translation systems

Mellebeek, Bart (2007) TransBooster:black box optimisation of machine translation systems. PhD thesis, Dublin City University.

Full text available as:

[img]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
16Mb

Abstract

Machine Translation (MT) systems tend to underperform when faced with long, linguistically complex sentences. Rule-based systems often trade a broad but shallow linguistic coverage for a deep, fine-grained analysis since hand-crafting rules based on detailed linguistic analyses is time-consuming, error-prone and expensive. Most datadriven systems lack the necessary syntactic knowledge to effectively deal with non-local grammatical phenomena. Therefore, both rule-based and data-driven MT systems are better at handling short, simple sentences than linguistically complex ones. This thesis proposes a new and modular approach to help MT systems improve then output quality by reducing the number of complexities in the input. Instead of trying to reinvent the wheel by proposing yet another approach to MT, we build on the strengths of existing MT paradigms while trying to remedy their shortcomings as much as possible. We do this by developing TransBooster, a wrapper technology that reduces the complexity of the MT input by a recursive decomposition algorithm which produces simple input chunks that are spoon-fed to a baseline MT system TransBooster is not an MT system itself: it does not perform automatic translation, but operates on top of an existing MT system, gulding it through the input and trying to help the baseline system to improve the quality of its own translations through automatic complexity reduction. In this dissertation, we outline the motivation behind TransBooster, explain its development in depth and investigate its impact on the three most important paradigms in the field Rule-based, Example-based and Statistical MT. In addition, we use the Trans-Booster architecture as a promising alternative to current Multi-Engine MT techniques. We evaluate TransBooster on the language pair Engl~sh-+Spanish with a combination of automatic and manual evaluation metrics, prov~ding a rigorous analysis of the potential and shortcomings of our approach.

Item Type:Thesis (PhD)
Date of Award:2007
Refereed:No
Supervisor(s):Way, Andy and van Genabith, Josef
Uncontrolled Keywords:machine translation; tule based mt; statistical mt; example based mt
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License
ID Code:16939
Deposited On:04 May 2012 10:40 by Fran Callaghan. Last Modified 08 May 2012 14:49

Download statistics

Archive Staff Only: edit this record