Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Enriching phrase tables for statistical machine translation using mixed embeddings

Passban, Peyman, Liu, Qun orcid logoORCID: 0000-0002-7000-1792 and Way, Andy orcid logoORCID: 0000-0001-5736-5930 (2016) Enriching phrase tables for statistical machine translation using mixed embeddings. In: COLING, the 26th International Conference on Computational Linguistics, 13-16 Dec 2016, Osaka, Japan. ISBN 978-4-87974-702-0

Abstract
The phrase table is considered to be the main bilingual resource for the phrase-based statistical machine translation (PBSMT) model. During translation, a source sentence is decomposed into several phrases. The best match of each source phrase is selected among several target-side counterparts within the phrase table, and processed by the decoder to generate a sentence-level translation. The best match is chosen according to several factors, including a set of bilingual features. PBSMT engines by default provide four probability scores in phrase tables which are considered as the main set of bilingual features. Our goal is to enrich that set of features, as a better feature set should yield better translations. We propose new scores generated by a Convolutional Neural Network (CNN) which indicate the semantic relatedness of phrase pairs. We evaluate our model in different experimental settings with different language pairs. We observe significant improvements when the proposed features are incorporated into the PBSMT pipeline.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Institutes and Centres > ADAPT
Published in: Proceedings of COLING, the 26th International Conference on Computational Linguisticss: Technical Papers. . Coling 2016 conference committee. ISBN 978-4-87974-702-0
Publisher:Coling 2016 conference committee
Official URL:https://aclweb.org/anthology/C16-1243
Copyright Information:© 2016 The Authors
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Science Foundation Ireland at ADAPT: Centre for Digital Content Platform Research (Grant 13/RC/2106).
ID Code:23230
Deposited On:02 May 2019 11:35 by Thomas Murtagh . Last Modified 02 May 2019 11:35
Documents

Full text available as:

[thumbnail of Enriching Phrase Tables for Statistical Machine Translation Using Mixed Embeddings.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
480kB
Metrics

Altmetric Badge

Dimensions Badge

Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record