Haque, Rejwanul ORCID: 0000-0003-1680-0099, Kumar Naskar, Sudip, Way, Andy ORCID: 0000-0001-5736-5930, Costa-Jussá, Marta and Banchs, Rafael E. (2010) Sentence similarity-based source context modelling in PBSMT. In: the International Conference on Asian Language Processing 2010, 28-30 Dec. 2010, Harbin, China.
Abstract
Target phrase selection, a crucial component of the state-of-the-art phrase-based statistical machine translation (PBSMT) model, plays a key role in generating accurate translation hypotheses. Inspired by context-rich word-sense disambiguation techniques, machine translation (MT) researchers have successfully integrated various types of source language context into the PBSMT model to improve target phrase selection. Among the various types of lexical and syntactic features, lexical syntactic descriptions in the form of supertags that preserve long-range word-to-word dependencies in a sentence have proven to be effective. These rich contextual features are able to disambiguate a source phrase, on the basis of the local syntactic behaviour of that phrase. In addition to local contextual information, global contextual information such as the grammatical structure of a sentence, sentence length and n-gram word sequences could provide additional important information to enhance this phrase-sense disambiguation. In this work, we explore various sentence similarity features by measuring similarity between a source sentence to be translated with the source-side of the bilingual training sentences and integrate them directly into the PBSMT model. We performed experiments on an English-to-Chinese translation task by applying sentence-similarity features both individually, and collaboratively with supertag-based features. We evaluate the performance of our approach and report a statistically significant relative improvement of 5.25% BLEU score when adding a sentence-similarity feature together with a supertag-based feature.
Metadata
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Event Type: | Conference |
Refereed: | Yes |
Uncontrolled Keywords: | sentence similarity; source context information; statistical machine translation |
Subjects: | Computer Science > Machine translating |
DCU Faculties and Centres: | Research Institutes and Centres > Centre for Next Generation Localisation (CNGL) DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing |
Published in: | Asian Language Processing (IALP), 2010 International Conference on. . IEEE. |
Publisher: | IEEE |
Official URL: | http://dx.doi.org/10.1109/IALP.2010.45 |
Copyright Information: | © 2010 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License |
ID Code: | 16158 |
Deposited On: | 20 Jun 2011 13:21 by Shane Harper . Last Modified 18 Nov 2020 16:46 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
158kB |
Metrics
Altmetric Badge
Dimensions Badge
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record