Enriching phrase tables for statistical machine translation
using mixed embeddings
Passban, Peyman, Liu, QunORCID: 0000-0002-7000-1792 and Way, AndyORCID: 0000-0001-5736-5930
(2016)
Enriching phrase tables for statistical machine translation
using mixed embeddings.
In: COLING, the 26th International Conference on Computational Linguistics, 13-16 Dec 2016, Osaka, Japan.
ISBN 978-4-87974-702-0
The phrase table is considered to be the main bilingual resource for the phrase-based statistical machine translation (PBSMT) model. During translation, a source sentence is decomposed
into several phrases. The best match of each source phrase is selected among several target-side
counterparts within the phrase table, and processed by the decoder to generate a sentence-level
translation. The best match is chosen according to several factors, including a set of bilingual
features. PBSMT engines by default provide four probability scores in phrase tables which are
considered as the main set of bilingual features. Our goal is to enrich that set of features, as a
better feature set should yield better translations. We propose new scores generated by a Convolutional Neural Network (CNN) which indicate the semantic relatedness of phrase pairs. We
evaluate our model in different experimental settings with different language pairs. We observe
significant improvements when the proposed features are incorporated into the PBSMT pipeline.
Proceedings of COLING, the 26th International Conference on Computational Linguisticss: Technical Papers.
.
Coling 2016 conference committee. ISBN 978-4-87974-702-0