Du, Jinhua ORCID: 0000-0002-3267-4881, Ankit, Srivastava, Way, Andy ORCID: 0000-0001-5736-5930, Maldonado Guerra, Alfredo ORCID: 0000-0001-8426-5249 and Lewis, David ORCID: 0000-0002-3503-4644 (2015) An empirical study of segment prioritization for incrementally retrained post-editing-based SMT. In: The Fifteenth MT Summit Conference, 30 Oct-3 Nov 2015, Miami, FL, USA.
Abstract
Post-editing the output of a statistical machine translation (SMT) system to obtain high-quality
translation has become an increasingly common application of SMT, which henceforth we refer to as post-editing-based SMT (PE-SMT). PE-SMT is often deployed as an incrementally
retrained system that can learn knowledge from human post-editing outputs as early as possible
to augment the SMT models to reduce PE time. In this scenario, the order of input segments
plays a very important role in reducing the overall PE time. Under the active learning-based
(AL) framework, this paper provides an empirical study of several typical segment prioritization methods, namely the cross entropy difference (CED), n-grams, perplexity (PPL) and
translation confidence, and verifies their performance on different data sets and language pairs.
Experiments in a simulated setting show that the confidence of translations performs best with
decreases of 1.72-4.55 points TER absolute on average compared to the sequential PE-based
incrementally retrained SMT.
Metadata
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Event Type: | Conference |
Refereed: | Yes |
Uncontrolled Keywords: | Authoring Tools; Controlled Languages; SpeechTo Speech Translation |
Subjects: | Computer Science > Machine learning |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > ADAPT |
Published in: | Al-Onaizan, Yaser and Lewis, Will, (eds.) Proceedings of MT Summit XV. 1. Association for Computational Linguistics. |
Publisher: | Association for Computational Linguistics |
Official URL: | https://amtaweb.org/wp-content/uploads/2015/10/MTS... |
Copyright Information: | © 2015 ACL |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License |
Funders: | Science Foundation Ireland through the ADAPT Centre (Grant 13/RC/2106) (www.adaptcentre.ie) at Dublin City University and Trinity College Dublin, Grant 610879 for the Falcon project funded by the European Commission |
ID Code: | 23216 |
Deposited On: | 01 May 2019 15:31 by Thomas Murtagh . Last Modified 20 May 2021 13:59 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
396kB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record