Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

An investigation into multi-word expressions in machine translation

Han, Lifeng orcid logoORCID: 0000-0002-3221-2185 (2022) An investigation into multi-word expressions in machine translation. PhD thesis, Dublin City University.

Abstract
Multi-word Expressions (MWEs) present challenges in natural language processing and computational linguistics due to their popular usage, richness in variety, idiomaticity, and non-decompositionality, which are present in the text content in which they are used. This is a typical level of expectation in the machine translation (MT) field where we require algorithms to perform a translation from one human language to another automatically while requiring high-quality output including features such as adequacy, fluency, and keeping the same or making creative and correct style decisions in that output. In this thesis, we carry out an extensive investigation into MWEs in Neural MT. Firstly, we carry out a review of relevant literature which includes experimental work on re-examining state-of-the-art models that combine knowledge of MWEs into MT systems, but with new language pairs setting to see what gaps might exist in the published literature. Secondly, we propose our new models on how to address MWE translations. This includes a design where we treat MWEs as low-frequency words and phrases translation issues, by integrating language-specific features such as strokes and radicals representation of Chinese characters into the learning model, expecting that this will facilitate improved accuracy. Thirdly, to properly examine different MT models' performances in the context of MWEs, we need to carry out a new evaluation methodology, and in light of this, we create a multilingual parallel corpus with MWE annotations (AlphaMWE). During the creation of this corpus, we classify the MT issues on MWE-related content into several categories with the expectation that this will help future MT researchers to focus on one or some of these in order to achieve a new state of the art in MT performance, ultimately moving towards human parity. Finally, we propose a new methodology for human in the loop MT evaluation with MWE considerations (HiLMeMe).
Metadata
Item Type:Thesis (PhD)
Date of Award:February 2022
Refereed:No
Supervisor(s):Smeaton, Alan F. and Jones, Gareth J.F.
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Institutes and Centres > ADAPT
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License
Funders:Science Foundation Ireland
ID Code:26559
Deposited On:16 Feb 2022 11:52 by Alan Smeaton . Last Modified 16 Feb 2022 11:52
Documents

Full text available as:

[thumbnail of Lifeng_Han_PhD_Thesis_singed_2print.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution-Noncommercial-No Derivative Works 4.0
4MB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record