Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

CloudLM: a cloud-based language model for machine translation

Ferrández-Tordera, Jorge, Ortiz-Rojas, Sergio and Toral, Antonio orcid logoORCID: 0000-0003-2357-2960 (2016) CloudLM: a cloud-based language model for machine translation. Prague Bulletin of Mathematical Linguistics (105). pp. 51-61. ISSN 1804-0462

Language models (LMs) are an essential element in statistical approaches to natural language processing for tasks such as speech recognition and machine translation (MT). The advent of big data leads to the availability of massive amounts of data to build LMs, and in fact, for the most prominent languages, using current techniques and hardware, it is not feasible to train LMs with all the data available nowadays. At the same time, it has been shown that the more data is used for a LM the better the performance, e.g. for MT, without any indication yet of reaching a plateau. This paper presents CloudLM, an open-source cloud-based LM intended for MT, which allows to query distributed LMs. CloudLM relies on Apache Solr and provides the functionality of state-of-the-art language modelling (it builds upon KenLM), while allowing to query massive LMs (as the use of local memory is drastically reduced), at the expense of slower decoding speed.
Item Type:Article (Published)
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Institutes and Centres > ADAPT
Research Institutes and Centres > Centre for Next Generation Localisation (CNGL)
Publisher:De Gruyter Open
Official URL:http://dx.doi.org/10.1515/pralin-2016-0002
Copyright Information:© 2016 PBML
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:European Union Seventh Framework Programme FP7/2007-2013 under grant agreement PIAPGA-2012-324414 (Abu-MaTran).
ID Code:23306
Deposited On:16 May 2019 11:30 by Thomas Murtagh . Last Modified 16 May 2019 11:30

Full text available as:

[thumbnail of CloudLM_-_a_Cloud-based_Language_Model_for_Machine_Translation[1].pdf]
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader


Downloads per month over past year

Archive Staff Only: edit this record