Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Multi-word expression-sensitive word alignment

Okita, Tsuyoshi, Maldonado Guerra, Alfredo orcid logoORCID: 0000-0001-8426-5249, Graham, Yvette and Way, Andy orcid logoORCID: 0000-0001-5736-5930 (2010) Multi-word expression-sensitive word alignment. In: CLIA 2010 - Fourth International Workshop On Cross Lingual Information Access: Computational Linguistics and the Information Need of Multilingual Societies, 28 Augt 2010, Beijing, China.

Abstract
This paper presents a new word alignment method which incorporates knowledge about Bilingual Multi-Word Expressions (BMWEs). Our method of word alignment first extracts such BMWEs in a bidirectional way for a given corpus and then starts conventional word alignment, considering the properties of BMWEs in their grouping as well as their alignment links. We give partial annotation of alignment links as prior knowledge to the word alignment process; by replacing the maximum likelihood estimate in the M-step of the IBM Models with the Maximum A Posteriori (MAP) estimate, prior knowledge about BMWEs is embedded in the prior in this MAP estimate. In our experiments, we saw an improvement of 0.77 Bleu points absolute in JP–EN. Except for one case, our method gave better results than the method using only BMWEs grouping. Even though this paper does not directly address the issues in Cross-Lingual Information Retrieval (CLIR), it discusses an approach of direct relevance to the field. This approach could be viewed as the opposite of current trends in CLIR on semantic space that incorporate a notion of order in the bag-of-words model (e.g. co-occurences).
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:Research Institutes and Centres > Centre for Next Generation Localisation (CNGL)
Research Institutes and Centres > National Centre for Language Technology (NCLT)
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Published in: Proceedings of the 4th Workshop on Cross Lingual Information Access. . Coling 2010 Organizing Committee.
Publisher:Coling 2010 Organizing Committee
Official URL:http://www.aclweb.org/anthology/W/W10/W10-4006.pdf
Copyright Information:© 2010 Association for Computational Linguistics
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Science Foundation Ireland
ID Code:15801
Deposited On:10 Nov 2010 15:51 by Shane Harper . Last Modified 07 Oct 2020 11:06
Documents

Full text available as:

[thumbnail of Multi-Word_Expression-Sensitive_Word_Alignment.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
116kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record