-ing words in RBMT: multilingual evaluation and exploration of pre- and post-processing solutions
Aranberri Monasterio, Nora (2010) -ing words in RBMT: multilingual evaluation and exploration of pre- and post-processing solutions. PhD thesis, Dublin City University.
Full text available as:
This PhD dissertation falls within the domain of machine translation and it specifically focuses on the machine translation of IT-domain -ing words into four target languages: French, German, Japanese and Spanish. Claimed to be problematic due to their linguistic flexibility, i.e. -ing words can function as nouns, adjectives and verbs, this
dissertation investigates how problematic -ing words are and explores possible solutions for improvement of their MT output. A corpus-based approach for a better representation of the domain-specific structures where -ing words occur is used. After selecting a significant sample, the -ing
words are classified following a functional categorisation presented by Izquierdo (2006). The sample is machine-translated using a customised RBMT system.
A feature-based human evaluation is then performed in order to obtain information about the specific feature under study. The results showed that 73% of the -ing words
were correctly translated in terms of grammaticality and accuracy for German, Japanese and Spanish. The percentage for French was lower at 52%. These data, combined with a thorough analysis of the MT output, allows for the identification of cross-language and language-specific issues and their characteristics, setting the path
for improvement. The approaches for improvements examined cover both the pre- and post-processing stages of automated translation. For pre-processing, controlled language (CL) and automatic source re-writing (ASR) are explored and evaluated. For post-processing, global search and replace (Global S&R) and statistical post-editing (SPE) methods are tested. CL is reported to reduce -ing word ambiguity but to not achieve substantial machine translation improvement. Regex-based implementations of ASR and Global S&R efforts show considerable translation improvements ranging from 60% to 95% and minimal degradation, ranging from 0% to 18%. The results yielded for SPE show little improvement, or even degradation at both sentence and -ing word level.
Archive Staff Only: edit this record