Improving KantanMT training efficiency with fast align

Shterionov, Dimitar; Du, Jinhua; Palminteri, Marc Anthony; Casanellas, Laura; O'Dowd, Tony; Way, Andy

Shterionov, Dimitar ORCID: 0000-0001-6300-797X, Du, Jinhua ORCID: 0000-0002-3267-4881, Palminteri, Marc Anthony, Casanellas, Laura, O'Dowd, Tony and Way, Andy ORCID: 0000-0001-5736-5930 (2016) Improving KantanMT training efficiency with fast align. In: Twelfth Conference of The Association for Machine Translation in the Americas, 28 Oct- 1 Nov 2016, Austin, TX, USA.

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

In recent years, statistical machine translation (SMT) has been widely deployed in translators’ workflow with significant improvement of productivity. However, prior to invoking an SMT system to translate an unknown text, an SMT engine needs to be built. As such, building speed of the engine is essential for the translation workflow, i.e., the sooner an engine is built, the sooner it will be exploited. With the increase of the computational capabilities of recent technology the building time for an SMT engine has decreased substantially. For example, cloud-based SMT providers, such as KantanMT, can built high-quality, ready-to-use, custom SMT engines in less than a couple of days. To speed-up furthermore this process we look into optimizing the word alignment process that takes place during building the SMT engine. Namely, we substitute the word alignment tool used by KantanMT pipeline – Giza++ – with a more efficient one, i.e., fast_align. In this work we present the design and the implementation of the KantanMT pipeline that uses fast_align in place of Giza++. We also conduct a comparison between the two word alignment tools with industry data and report on our findings. Up to our knowledge, such extensive empirical evaluation of the two tools has not been done before.

Metadata

Item Type:	Conference or Workshop Item (Paper)
Event Type:	Conference
Refereed:	Yes
Subjects:	Computer Science > Machine translating
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > ADAPT
Published in:	Beregovaya, Olga, (ed.) Proceedings of AMTA 2016: MT Users' Track. 2. AMTA.
Publisher:	AMTA
Copyright Information:	© 2016 the Authors. CC-BY-ND
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
ID Code:	23348
Deposited On:	22 May 2019 15:34 by INVALID USER. Last Modified 05 May 2020 15:58

Documents

Full text available as:

[thumbnail of Improving_KantanMT_Training_Efficiency_with_FastAlign[1].pdf]

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
195kB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

Improving KantanMT training efficiency with fast align

Downloads