On the integration of linguistic features into statistical and neural machine translation

Vanmassenhove, Eva Odette Jef

Vanmassenhove, Eva Odette Jef ORCID: 0000-0003-1162-820X (2019) On the integration of linguistic features into statistical and neural machine translation. PhD thesis, Dublin City University.

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

Recent years have seen an increased interest in machine translation technologies and applications due to an increasing need to overcome language barriers in many sectors. New machine translations technologies are emerging rapidly and with them, bold claims of achieving human parity such as: (i) the results produced approach "accuracy achieved by average bilingual human translators [on some test sets]" (Wu et al., 2017b) or (ii) the "translation quality is at human parity when compared to professional human translators" (Hassan et al., 2018) have seen the light of day (Läubli et al., 2018). Aside from the fact that many of these papers craft their own definition of human parity, these sensational claims are often not supported by a complete analysis of all aspects involved in translation. Establishing the discrepancies between the strengths of statistical approaches to machine translation and the way humans translate has been the starting point of our research. By looking at machine translation output and linguistic theory, we were able to identify some remaining issues. The problems range from simple number and gender agreement errors to more complex phenomena such as the correct translation of aspectual values and tenses. Our experiments confirm, along with other studies (Bentivogli et al., 2016), that neural machine translation has surpassed statistical machine translation in many aspects. However, some problems remain and others have emerged. We cover a series of problems related to the integration of specific linguistic features into statistical and neural machine translation, aiming to analyse and provide a solution to some of them. Our work focuses on addressing three main research questions that revolve around the complex relationship between linguistics and machine translation in general. By taking linguistic theory as a starting point we examine to what extent theory is reflected in the current systems. We identify linguistic information that is lacking in order for automatic translation systems to produce more accurate translations and integrate additional features into the existing pipelines. We identify overgeneralization or 'algorithmic bias' as a potential drawback of neural machine translation and link it to many of the remaining linguistic issues.

Metadata

Item Type:	Thesis (PhD)
Date of Award:	November 2019
Refereed:	No
Supervisor(s):	Way, Andy
Uncontrolled Keywords:	Statistical Machine Translation; Neural Machine Translation; Linguistics, Tense; Aspect; Subject-verb Agreement; Gender Bias; Gender Agreement; Lexical Diversity; Lexical Loss; Linguistic Loss; Algorithmic Bias
Subjects:	Computer Science > Artificial intelligence Computer Science > Computational linguistics Computer Science > Machine learning Computer Science > Machine translating Humanities > French language Humanities > Language Humanities > Linguistics Humanities > Translating and interpreting Humanities > Spanish language Social Sciences > Gender
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > ADAPT
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License
Funders:	Dublin City University, Science Foundation Ireland
ID Code:	23714
Deposited On:	19 Nov 2019 14:16 by Andrew Way . Last Modified 30 Sep 2022 15:05

Documents

Full text available as:

[thumbnail of Eva_Vanmassenhove_hardbound.pdf]

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
1MB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

On the integration of linguistic features into statistical and neural machine translation

Downloads