Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Feature decay algorithms for fast deployment of accurate statistical machine translation systems

Bicici, Ergun (2013) Feature decay algorithms for fast deployment of accurate statistical machine translation systems. In: ACL 2013 8th workshop on statistical machine translation, 8-9 Aug 2013, Sofia, Bulgaria.

Abstract
We use feature decay algorithms (FDA) for fast deployment of accurate statistical machine translation systems taking only about half a day for each translation direction. We develop parallel FDA for solving computational scalability problems caused by the abundance of training data for SMT models and LM models and still achieve SMT performance that is on par with using all of the training data or better. Parallel FDA runs separate FDA models on randomized subsets of the training data and combines the instance selections later. Parallel FDA can also be used for selecting the LM corpus based on the training set selected by parallel FDA. The high quality of the selected training data allows us to obtain very accurate translation outputs close to the top performing SMT systems. The relevancy of the selected LM corpus can reach up to 86% reduction in the number of OOV tokens and up to 74% reduction in the perplexity. We perform SMT experiments in all language pairs in the WMT13 translation task and obtain SMT performance close to the top systems using significantly less resources for training and development.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Workshop
Refereed:Yes
Uncontrolled Keywords:Feature decay algorithms
Subjects:Computer Science > Computational linguistics
Computer Science > Machine translating
Computer Science > Artificial intelligence
Computer Science > Information retrieval
Computer Science > Algorithms
DCU Faculties and Centres:Research Institutes and Centres > Centre for Next Generation Localisation (CNGL)
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Publisher:Association for Computational Linguistics
Official URL:http://www.aclweb.org/anthology-new/W/W13/W13-2206...
Copyright Information:© 2013 ACL
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:QTLaunchPad, Centre for Next Generation Localisation
ID Code:19106
Deposited On:16 Aug 2013 11:07 by Mehmet Ergun Bicici . Last Modified 16 Aug 2013 11:07
Documents

Full text available as:

[thumbnail of FDAforFDA.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
173kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record