Applying N-gram alignment entropy to improve feature decay algorithms

Poncelas, Alberto; Maillette de Buy Wenniger, Gideon; Way, Andy

Poncelas, Alberto ORCID: 0000-0002-5089-1687, Maillette de Buy Wenniger, Gideon and Way, Andy ORCID: 0000-0001-5736-5930 (2017) Applying N-gram alignment entropy to improve feature decay algorithms. The Prague Bulletin of Mathematical Linguistics (108). pp. 245-256. ISSN 0032-6585

Abstract
Metadata
Downloads
Documents
Metrics

[+][-]

Abstract

Data Selection is a popular step in Machine Translation pipelines. Feature Decay Algorithms (FDA) is a technique for data selection that has shown a good performance in several tasks. FDA aims to maximize the coverage of n-grams in the test set. However, intuitively, more ambiguous n-grams require more training examples in order to adequately estimate their translation probabilities. This ambiguity can be measured by alignment entropy. In this paper we propose two methods for calculating the alignment entropies for n-grams of any size, which can be used for improving the performance of FDA. We evaluate the substitution of the n-gram-specific entropy values computed by these methods to the parameters of both the exponential and linear decay factor of FDA. The experiments conducted on German-to-English and Czech-to-English translation demonstrate that the use of alignment entropies can lead to an increase in the quality of the results of FDA.

Metadata

Item Type:	Article (Published)
Refereed:	Yes
Uncontrolled Keywords:	Feature Decay Algorithms (FDA)
Subjects:	Computer Science > Machine translating Computer Science > Algorithms
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > ADAPT
Publisher:	Univerzita Karlova v Praze
Official URL:	https://doi.org/10.1515/pralin-2017-0024
Copyright Information:	© 2017 The Authors
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:	ADAPT Centre under the SFI Research Centres Programme (Grant 13/RC/2106)., European Union’s Horizon 2020 under the European Union’s Horizon 2020 research and innovthe Marie Skłodowska-Curie grant agreement No 713567.
ID Code:	22304
Deposited On:	29 Mar 2018 10:34 by Gideon Maillette De buy . Last Modified 22 Jan 2021 14:17

Documents

Full text available as:

[thumbnail of Applying N-gram Alignment Entropy to Improve Feature Decay Algorithms]

Preview

PDF (Applying N-gram Alignment Entropy to Improve Feature Decay Algorithms) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
204kB

Metrics

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

Applying N-gram alignment entropy to improve feature decay algorithms

Altmetric Badge

Dimensions Badge

Downloads