Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

An empirical study of segment prioritization for incrementally retrained post-editing-based SMT

Du, Jinhua orcid logoORCID: 0000-0002-3267-4881, Ankit, Srivastava, Way, Andy orcid logoORCID: 0000-0001-5736-5930, Maldonado Guerra, Alfredo orcid logoORCID: 0000-0001-8426-5249 and Lewis, David orcid logoORCID: 0000-0002-3503-4644 (2015) An empirical study of segment prioritization for incrementally retrained post-editing-based SMT. In: The Fifteenth MT Summit Conference, 30 Oct-3 Nov 2015, Miami, FL, USA.

Abstract
Post-editing the output of a statistical machine translation (SMT) system to obtain high-quality translation has become an increasingly common application of SMT, which henceforth we refer to as post-editing-based SMT (PE-SMT). PE-SMT is often deployed as an incrementally retrained system that can learn knowledge from human post-editing outputs as early as possible to augment the SMT models to reduce PE time. In this scenario, the order of input segments plays a very important role in reducing the overall PE time. Under the active learning-based (AL) framework, this paper provides an empirical study of several typical segment prioritization methods, namely the cross entropy difference (CED), n-grams, perplexity (PPL) and translation confidence, and verifies their performance on different data sets and language pairs. Experiments in a simulated setting show that the confidence of translations performs best with decreases of 1.72-4.55 points TER absolute on average compared to the sequential PE-based incrementally retrained SMT.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:Authoring Tools; Controlled Languages; SpeechTo Speech Translation
Subjects:Computer Science > Machine learning
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Institutes and Centres > ADAPT
Published in: Al-Onaizan, Yaser and Lewis, Will, (eds.) Proceedings of MT Summit XV. 1. Association for Computational Linguistics.
Publisher:Association for Computational Linguistics
Official URL:https://amtaweb.org/wp-content/uploads/2015/10/MTS...
Copyright Information:© 2015 ACL
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Science Foundation Ireland through the ADAPT Centre (Grant 13/RC/2106) (www.adaptcentre.ie) at Dublin City University and Trinity College Dublin, Grant 610879 for the Falcon project funded by the European Commission
ID Code:23216
Deposited On:01 May 2019 15:31 by Thomas Murtagh . Last Modified 20 May 2021 13:59
Documents

Full text available as:

[thumbnail of An Empirical Study of Segment Prioritization for Incrementally Retrained Post-Editing-Based SMT.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
396kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record