Enriching phrase tables for statistical machine translation
using mixed embeddings

Passban, Peyman; Liu, Qun; Way, Andy

Passban, Peyman, Liu, Qun ORCID: 0000-0002-7000-1792 and Way, Andy ORCID: 0000-0001-5736-5930 (2016) Enriching phrase tables for statistical machine translation using mixed embeddings. In: COLING, the 26th International Conference on Computational Linguistics, 13-16 Dec 2016, Osaka, Japan. ISBN 978-4-87974-702-0

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

The phrase table is considered to be the main bilingual resource for the phrase-based statistical machine translation (PBSMT) model. During translation, a source sentence is decomposed into several phrases. The best match of each source phrase is selected among several target-side counterparts within the phrase table, and processed by the decoder to generate a sentence-level translation. The best match is chosen according to several factors, including a set of bilingual features. PBSMT engines by default provide four probability scores in phrase tables which are considered as the main set of bilingual features. Our goal is to enrich that set of features, as a better feature set should yield better translations. We propose new scores generated by a Convolutional Neural Network (CNN) which indicate the semantic relatedness of phrase pairs. We evaluate our model in different experimental settings with different language pairs. We observe significant improvements when the proposed features are incorporated into the PBSMT pipeline.

Metadata

Item Type:	Conference or Workshop Item (Paper)
Event Type:	Conference
Refereed:	Yes
Subjects:	Computer Science > Machine translating
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > ADAPT
Published in:	Proceedings of COLING, the 26th International Conference on Computational Linguisticss: Technical Papers. . Coling 2016 conference committee. ISBN 978-4-87974-702-0
Publisher:	Coling 2016 conference committee
Official URL:	https://aclweb.org/anthology/C16-1243
Copyright Information:	© 2016 The Authors
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:	Science Foundation Ireland at ADAPT: Centre for Digital Content Platform Research (Grant 13/RC/2106).
ID Code:	23230
Deposited On:	02 May 2019 11:35 by INVALID USER. Last Modified 02 May 2019 11:35

Documents

Full text available as:

[thumbnail of Enriching Phrase Tables for Statistical Machine Translation Using Mixed Embeddings.pdf]

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
480kB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

Enriching phrase tables for statistical machine translation using mixed embeddings

Downloads