Comparing constituency and dependency representations for SMT phrase-extraction
Hearne, Mary, Ozdowska, Sylwia and Tinsley, John
(2008)
Comparing constituency and dependency representations for SMT phrase-extraction.
In: TALN 2008 - la 15éme Conférence Annuelle sur le Traitement Automatique des Langues Naturelles, 9-13 June 2008, Avignon, France.
We consider the value of replacing and/or combining string-based methods with syntax-based methods for phrase-based statistical machine translation (PBSMT),
and we also consider the relative merits of using constituency-annotated vs. dependency-annotated training data. We automatically derive two subtree-aligned treebanks,
dependency-based and constituency-based, from a parallel English–French corpus and extract syntactically motivated word- and phrase-pairs. We automatically measure PB-SMT quality. The results show that combining string-based and syntax-based word- and phrase-pairs can improve translation quality irrespective of the type of syntactic annotation. Furthermore, using dependency annotation yields greater translation quality than constituency annotation for PB-SMT.