Browse DORAS
Browse Theses
Search
Latest Additions
Creative Commons License
Except where otherwise noted, content on this site is licensed for use under a:

-ing words in RBMT: multilingual evaluation and exploration of pre- and post-processing solutions

Aranberri Monasterio, Nora (2010) -ing words in RBMT: multilingual evaluation and exploration of pre- and post-processing solutions. PhD thesis, Dublin City University.

Full text available as:

[img]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
2382Kb

Abstract

This PhD dissertation falls within the domain of machine translation and it specifically focuses on the machine translation of IT-domain -ing words into four target languages: French, German, Japanese and Spanish. Claimed to be problematic due to their linguistic flexibility, i.e. -ing words can function as nouns, adjectives and verbs, this dissertation investigates how problematic -ing words are and explores possible solutions for improvement of their MT output. A corpus-based approach for a better representation of the domain-specific structures where -ing words occur is used. After selecting a significant sample, the -ing words are classified following a functional categorisation presented by Izquierdo (2006). The sample is machine-translated using a customised RBMT system. A feature-based human evaluation is then performed in order to obtain information about the specific feature under study. The results showed that 73% of the -ing words were correctly translated in terms of grammaticality and accuracy for German, Japanese and Spanish. The percentage for French was lower at 52%. These data, combined with a thorough analysis of the MT output, allows for the identification of cross-language and language-specific issues and their characteristics, setting the path for improvement. The approaches for improvements examined cover both the pre- and post-processing stages of automated translation. For pre-processing, controlled language (CL) and automatic source re-writing (ASR) are explored and evaluated. For post-processing, global search and replace (Global S&R) and statistical post-editing (SPE) methods are tested. CL is reported to reduce -ing word ambiguity but to not achieve substantial machine translation improvement. Regex-based implementations of ASR and Global S&R efforts show considerable translation improvements ranging from 60% to 95% and minimal degradation, ranging from 0% to 18%. The results yielded for SPE show little improvement, or even degradation at both sentence and -ing word level.

Item Type:Thesis (PhD)
Date of Award:March 2010
Refereed:No
Supervisor(s):O'Brien, Sharon
Uncontrolled Keywords:-ing words; machine translation; evaluation;
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Humanities and Social Science > School of Applied Language and Intercultural Studies
Research Initiatives and Centres > Centre for Translation and Textual Studies (CTTS)
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License
Funders:Enterprise Ireland, Symantec
ID Code:15093
Deposited On:29 Mar 2010 14:35 by Sharon O'Brien. Last Modified 29 Mar 2010 14:38

Download statistics

Archive Staff Only: edit this record