Pal, Santanu, Kumar Naskar, Sudip, Pecina, Pavel, Bandyopadhyay, Sivaji and Way, Andy ORCID: 0000-0001-5736-5930 (2010) Handling named entities and compound verbs in phrase-based statistical machine translation. In: MWE 2010 - Workshop on Multiword Expressions: from Theory to Applications, 28 August 2010, Beijing, China.
Abstract
Data preprocessing plays a crucial role in phrase-based statistical machine translation (PB-SMT). In this paper, we show how single-tokenization of two types of multi-word expressions (MWE), namely named entities (NE) and compound
verbs, as well as their prior alignment can boost the performance of PB-SMT. Single-tokenization of compound verbs
and named entities (NE) provides significant gains over the baseline PB-SMT system. Automatic alignment of NEs substantially improves the overall MT performance, and thereby the word alignment quality indirectly. For establishing NE alignments, we transliterate source NEs into the target language and then compare them with the target NEs. Target language NEs are first converted into a canonical form before the comparison takes place. Our best system achieves statistically significant improvements
(4.59 BLEU points absolute, 52.5% relative improvement) on an English—Bangla translation task.
Metadata
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Event Type: | Workshop |
Refereed: | Yes |
Subjects: | Computer Science > Machine translating |
DCU Faculties and Centres: | Research Institutes and Centres > Centre for Next Generation Localisation (CNGL) DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing |
Published in: | Proceedings of the 2010 Workshop on Multiword Expressions: from Theory to Applications. . Association for Computational Linguistics. |
Publisher: | Association for Computational Linguistics |
Official URL: | http://www.aclweb.org/anthology/W/W10/W10-3707.pdf |
Copyright Information: | © 2010 Association for Computational Linguistics |
Funders: | Science Foundation Ireland |
ID Code: | 15810 |
Deposited On: | 10 Nov 2010 16:25 by Shane Harper . Last Modified 09 Nov 2018 14:31 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
185kB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record