Lost in translation: loss and decay of linguistic richness in machine translation

Way, Andy; Shterionov, Dimitar; Vanmassenhove, Eva

Way, Andy ORCID: 0000-0001-5736-5930, Shterionov, Dimitar ORCID: 0000-0001-6300-797X and Vanmassenhove, Eva ORCID: 0000-0003-1162-820X (2019) Lost in translation: loss and decay of linguistic richness in machine translation. In: MT Summit XVII, 19 - 23 Aug 2019, Dublin, Ireland.

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

This work presents an empirical approach to quantifying the loss of lexical richness in Machine Translation (MT) systems compared to Human Translation (HT).Our experiments show how current MT systems indeed fail to render the lexical diversity of human generated or translated text. The inability of MT systems to generate diverse outputs and its tendency to exacerbate already frequent patterns while ignoring less frequent ones, might be the underlying cause for, among others, the currently heavily debated issues related to gender biased output. Can we indeed, aside from biased data, talk about an algorithm that exacerbates seen biases?

Metadata

Item Type:	Conference or Workshop Item (Paper)
Event Type:	Conference
Refereed:	Yes
Subjects:	Computer Science > Machine translating
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > ADAPT
Published in:	Forcada, Mikel, Way, Barry, Haddow, Barry and Sennrich, Rico, (eds.) Proceedings of MT Summit XVII. 1. European Association for Machine Translation.
Publisher:	European Association for Machine Translation
Official URL:	https://www.aclweb.org/anthology/W19-6622.pdf
Copyright Information:	© 2019 The Authors.
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:	Dublin City University Faculty of Engineering & Computing under the Daniel O’Hare Research Scholarship, ADAPT Centre for Digital Content Technology, which is funded under the SFI Research Centres Programme (Grant 13/RC/2106).
ID Code:	23865
Deposited On:	21 Oct 2019 13:08 by Andrew Way . Last Modified 24 May 2023 10:05

Documents

Full text available as:

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution 4.0
329kB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

Lost in translation: loss and decay of linguistic richness in machine translation

Downloads