Skip to main content
DORAS
DCU Online Research Access Service
Login (DCU Staff Only)
An empirical study of segment prioritization for incrementally retrained post-editing-based SMT

Du, Jinhua ORCID: 0000-0002-3267-4881, Ankit, Srivastava, Way, Andy ORCID: 0000-0001-5736-5930, Maldonado Guerra, Alfredo ORCID: 0000-0001-8426-5249 and Lewis, David ORCID: 0000-0002-3503-4644 (2015) An empirical study of segment prioritization for incrementally retrained post-editing-based SMT. In: The Fifteenth MT Summit Conference, 30 Oct-3 Nov 2015, Miami, FL, USA.

Full text available as:

[img]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
396kB

Abstract

Post-editing the output of a statistical machine translation (SMT) system to obtain high-quality translation has become an increasingly common application of SMT, which henceforth we refer to as post-editing-based SMT (PE-SMT). PE-SMT is often deployed as an incrementally retrained system that can learn knowledge from human post-editing outputs as early as possible to augment the SMT models to reduce PE time. In this scenario, the order of input segments plays a very important role in reducing the overall PE time. Under the active learning-based (AL) framework, this paper provides an empirical study of several typical segment prioritization methods, namely the cross entropy difference (CED), n-grams, perplexity (PPL) and translation confidence, and verifies their performance on different data sets and language pairs. Experiments in a simulated setting show that the confidence of translations performs best with decreases of 1.72-4.55 points TER absolute on average compared to the sequential PE-based incrementally retrained SMT.

Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:Authoring Tools; Controlled Languages; SpeechTo Speech Translation
Subjects:Computer Science > Machine learning
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Initiatives and Centres > ADAPT
Published in: Al-Onaizan, Yaser and Lewis, Will, (eds.) Proceedings of MT Summit XV. 1. Association for Computational Linguistics.
Publisher:Association for Computational Linguistics
Official URL:https://amtaweb.org/wp-content/uploads/2015/10/MTSummitXV_ResearchTrack.pdf
Copyright Information:© 2015 ACL
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Science Foundation Ireland through the ADAPT Centre (Grant 13/RC/2106) (www.adaptcentre.ie) at Dublin City University and Trinity College Dublin, Grant 610879 for the Falcon project funded by the European Commission
ID Code:23216
Deposited On:01 May 2019 15:31 by Thomas Murtagh . Last Modified 20 May 2021 13:59

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

Altmetric
- Altmetric
+ Altmetric
  • Student Email
  • Staff Email
  • Student Apps
  • Staff Apps
  • Loop
  • Disclaimer
  • Privacy
  • Contact Us