TransBooster:black box optimisation of machine translation systems
Mellebeek, Bart (2007) TransBooster:black box optimisation of machine translation systems. PhD thesis, Dublin City University.
Full text available as:
Machine Translation (MT) systems tend to underperform when faced with long, linguistically complex sentences. Rule-based systems often trade a broad but shallow linguistic coverage for a deep, fine-grained analysis since hand-crafting rules based on detailed linguistic analyses is time-consuming, error-prone and expensive. Most datadriven systems lack the necessary syntactic knowledge to effectively deal with non-local grammatical phenomena.
Therefore, both rule-based and data-driven MT systems are better at handling short, simple sentences than linguistically complex ones.
This thesis proposes a new and modular approach to help MT systems improve then output quality by reducing the number of complexities in the input. Instead of trying to reinvent the wheel by proposing yet another approach to MT, we build on the strengths of existing MT paradigms while trying to remedy their shortcomings as much as possible. We do this by developing TransBooster, a wrapper technology that reduces the complexity of the MT input by a recursive decomposition algorithm which produces simple input chunks that are spoon-fed to a baseline MT system TransBooster is not an MT system itself: it does not perform automatic translation, but operates on top of an existing MT system, gulding it through the input and trying to help the baseline system to improve the quality of its own translations through automatic complexity reduction.
In this dissertation, we outline the motivation behind TransBooster, explain its development in depth and investigate its impact on the three most important paradigms in the field Rule-based, Example-based and Statistical MT. In addition, we use the Trans-Booster architecture as a promising alternative to current Multi-Engine MT techniques. We evaluate TransBooster on the language pair Engl~sh-+Spanish with a combination of automatic and manual evaluation metrics, prov~ding a rigorous analysis of the potential and shortcomings of our approach.
Archive Staff Only: edit this record