Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Bilingually motivated domain-adapted word segmentation for statistical machine translation

Ma, Yanjun and Way, Andy orcid logoORCID: 0000-0001-5736-5930 (2009) Bilingually motivated domain-adapted word segmentation for statistical machine translation. In: EACL 2009 Workshop on Computational Approaches to Semitic Languages, 31 March 2009, Athens, Greece.

Abstract
We introduce a word segmentation approach to languages where word boundaries are not orthographically marked, with application to Phrase-Based Statistical Machine Translation (PB-SMT). Instead of using manually segmented monolingual domain-specific corpora to train segmenters, we make use of bilingual corpora and statistical word alignment techniques. First of all, our approach is adapted for the specific translation task at hand by taking the corresponding source (target) language into account. Secondly, this approach does not rely on manually segmented training data so that it can be automatically adapted for different domains. We evaluate the performance of our segmentation approach on PB-SMT tasks from two domains and demonstrate that our approach scores consistently among the best results across different data conditions.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Workshop
Refereed:Yes
Uncontrolled Keywords:statistical machine translation;
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:Research Institutes and Centres > National Centre for Language Technology (NCLT)
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Publisher:Association for Computational Linguistics
Official URL:http://www.aclweb.org/anthology/E/E09/
Copyright Information:©2009 Association for Computational Linguistics
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Science Foundation Ireland, SFI O5/IN/1732
ID Code:15164
Deposited On:15 Feb 2010 12:58 by DORAS Administrator . Last Modified 14 Nov 2018 16:32
Documents

Full text available as:

[thumbnail of MaWay_eacl_09.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
337kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record