Improving KantanMT training efficiency with fast align
Shterionov, DimitarORCID: 0000-0001-6300-797X, Du, JinhuaORCID: 0000-0002-3267-4881, Palminteri, Marc Anthony, Casanellas, Laura, O'Dowd, Tony and Way, AndyORCID: 0000-0001-5736-5930
(2016)
Improving KantanMT training efficiency with fast align.
In: Twelfth Conference of The Association for Machine Translation in the Americas, 28 Oct- 1 Nov 2016, Austin, TX, USA.
In recent years, statistical machine translation (SMT) has been widely deployed in translators’
workflow with significant improvement of productivity. However, prior to invoking an SMT
system to translate an unknown text, an SMT engine needs to be built. As such, building speed
of the engine is essential for the translation workflow, i.e., the sooner an engine is built, the
sooner it will be exploited.
With the increase of the computational capabilities of recent technology the building time for
an SMT engine has decreased substantially. For example, cloud-based SMT providers, such as
KantanMT, can built high-quality, ready-to-use, custom SMT engines in less than a couple of
days. To speed-up furthermore this process we look into optimizing the word alignment process
that takes place during building the SMT engine. Namely, we substitute the word alignment
tool used by KantanMT pipeline – Giza++ – with a more efficient one, i.e., fast_align.
In this work we present the design and the implementation of the KantanMT pipeline that uses
fast_align in place of Giza++. We also conduct a comparison between the two word
alignment tools with industry data and report on our findings. Up to our knowledge, such
extensive empirical evaluation of the two tools has not been done before.