Browse DORAS
Browse Theses
Latest Additions
Creative Commons License
Except where otherwise noted, content on this site is licensed for use under a:

-ing words in RBMT: multilingual evaluation and exploration of pre- and post-processing solutions

Aranberri Monasterio, Nora (2010) -ing words in RBMT: multilingual evaluation and exploration of pre- and post-processing solutions. PhD thesis, Dublin City University.

Full text available as:

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader


This PhD dissertation falls within the domain of machine translation and it specifically focuses on the machine translation of IT-domain -ing words into four target languages: French, German, Japanese and Spanish. Claimed to be problematic due to their linguistic flexibility, i.e. -ing words can function as nouns, adjectives and verbs, this dissertation investigates how problematic -ing words are and explores possible solutions for improvement of their MT output. A corpus-based approach for a better representation of the domain-specific structures where -ing words occur is used. After selecting a significant sample, the -ing words are classified following a functional categorisation presented by Izquierdo (2006). The sample is machine-translated using a customised RBMT system. A feature-based human evaluation is then performed in order to obtain information about the specific feature under study. The results showed that 73% of the -ing words were correctly translated in terms of grammaticality and accuracy for German, Japanese and Spanish. The percentage for French was lower at 52%. These data, combined with a thorough analysis of the MT output, allows for the identification of cross-language and language-specific issues and their characteristics, setting the path for improvement. The approaches for improvements examined cover both the pre- and post-processing stages of automated translation. For pre-processing, controlled language (CL) and automatic source re-writing (ASR) are explored and evaluated. For post-processing, global search and replace (Global S&R) and statistical post-editing (SPE) methods are tested. CL is reported to reduce -ing word ambiguity but to not achieve substantial machine translation improvement. Regex-based implementations of ASR and Global S&R efforts show considerable translation improvements ranging from 60% to 95% and minimal degradation, ranging from 0% to 18%. The results yielded for SPE show little improvement, or even degradation at both sentence and -ing word level.

Item Type:Thesis (PhD)
Date of Award:March 2010
Supervisor(s):O'Brien, Sharon
Uncontrolled Keywords:-ing words; machine translation; evaluation;
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Humanities and Social Science > School of Applied Language and Intercultural Studies
Research Initiatives and Centres > Centre for Translation and Textual Studies (CTTS)
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License
Funders:Enterprise Ireland, Symantec
ID Code:15093
Deposited On:29 Mar 2010 14:35 by Sharon O'Brien. Last Modified 29 Mar 2010 14:38

Download statistics

Archive Staff Only: edit this record