Capturing translational divergences with a statistical tree-to-tree aligner
Hearne, Mary, Tinsley, John, Zhechev, Ventsislav and Way, AndyORCID: 0000-0001-5736-5930
(2007)
Capturing translational divergences with a statistical tree-to-tree aligner.
In: TMI-07 - Proceedings of The 11th Conference on Theoretical and Methodological Issues in Machine Translation, 7-9 September 2007, Skövde, Sweden.
Parallel treebanks, which comprise paired source-target parse trees aligned at sub-sentential level, could be useful
for many applications, particularly data-driven machine translation. In this paper, we focus on how translational
divergences are captured within a parallel treebank using a fully automatic statistical tree-to-tree aligner. We
observe that while the algorithm performs well at the phrase level, performance on lexical-level alignments
is compromised by an inappropriate bias towards coverage rather than precision. This preference for high precision
rather than broad coverage in terms of expressing translational divergences through tree-alignment stands in
direct opposition to the situation for SMT word-alignment models. We suggest that this has implications not only
for tree-alignment itself but also for the broader area of induction of syntaxaware models for SMT.