Bicici, Ergun (2013) Feature decay algorithms for fast deployment of accurate statistical machine translation systems. In: ACL 2013 8th workshop on statistical machine translation, 8-9 Aug 2013, Sofia, Bulgaria.
Abstract
We use feature decay algorithms (FDA) for fast deployment of accurate statistical machine translation systems taking only about half a day for each translation direction. We develop parallel FDA for solving computational scalability problems caused by the abundance of training data for SMT models and LM models and still achieve SMT performance that is on par with using all of the training data or better. Parallel FDA runs separate FDA models on randomized subsets of the training data and combines the instance selections later. Parallel FDA can also be used for selecting the LM corpus based on the training set selected by parallel FDA. The high quality of the selected training data allows us to obtain very accurate translation outputs close to the top performing SMT systems. The relevancy of the selected LM corpus can reach up to 86% reduction in the number of OOV tokens and up to 74% reduction in the perplexity. We perform SMT experiments in all language pairs in the
WMT13 translation task and obtain SMT performance close to the top systems using significantly less resources for training and development.
Metadata
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Event Type: | Workshop |
Refereed: | Yes |
Uncontrolled Keywords: | Feature decay algorithms |
Subjects: | Computer Science > Computational linguistics Computer Science > Machine translating Computer Science > Artificial intelligence Computer Science > Information retrieval Computer Science > Algorithms |
DCU Faculties and Centres: | Research Institutes and Centres > Centre for Next Generation Localisation (CNGL) DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing |
Publisher: | Association for Computational Linguistics |
Official URL: | http://www.aclweb.org/anthology-new/W/W13/W13-2206... |
Copyright Information: | © 2013 ACL |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License |
Funders: | QTLaunchPad, Centre for Next Generation Localisation |
ID Code: | 19106 |
Deposited On: | 16 Aug 2013 11:07 by Mehmet Ergun Bicici . Last Modified 16 Aug 2013 11:07 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
173kB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record