Robust sub-sentential alignment of phrase-structure trees
Groves, Declan, Hearne, Mary and Way, AndyORCID: 0000-0001-5736-5930
(2004)
Robust sub-sentential alignment of phrase-structure trees.
In: COLING 2004 - 20th International Conference on Computational Linguistics, 23-27 August 2004, Geneva, Switzerland.
Data-Oriented Translation (DOT), based on Data-Oriented Parsing (DOP), is a language-independent MT engine which exploits parsed, aligned bitexts to produce very high quality translations. However, data acquisition constitutes a serious bottleneck as DOT requires parsed sentences aligned at both sentential and sub-structural levels. Manual substructural alignment is time-consuming, error-prone and requires considerable knowledge of both source and target languages and how they are related. Automating this process is essential in order to carry out
the large-scale translation experiments necessary to
assess the full potential of DOT. We present a novel algorithm which automatically induces sub-structural alignments between context-free phrase structure trees in a fast and consistent fashion requiring little or no knowledge of the language pair. We present results from a number of experiments which indicate that our method provides a serious alternative to manual alignment.