Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Predicting sentence translation quality using extrinsic and language independent features

Bicici, Ergun, Groves, Declan and van Genabith, Josef orcid logoORCID: 0000-0003-1322-7944 (2013) Predicting sentence translation quality using extrinsic and language independent features. Machine Translation, 27 (3-4). pp. 171-192. ISSN 0922-6567

Abstract
We develop a top performing model for automatic, accurate, and language independent prediction of sentence-level statistical machine translation (SMT) quality with or without looking at the translation outputs. We derive various feature functions measuring the closeness of a given test sentence to the training data and the difficulty of translating the sentence. We describe \texttt{mono} feature functions that are based on statistics of only one side of the parallel training corpora and \texttt{duo} feature functions that incorporate statistics involving both source and target sides of the training data. Overall, we describe novel, language independent, and SMT system extrinsic features for predicting the SMT performance, which also rank high during feature ranking evaluations. We experiment with different learning settings, with or without looking at the translations, which help differentiate the contribution of different feature sets. We apply partial least squares and feature subset selection, both of which improve the results and we present ranking of the top features selected for each learning setting, providing an exhaustive analysis of the extrinsic features used. We show that by just looking at the test source sentences and not using the translation outputs at all, we can achieve better performance than a baseline system using SMT model dependent features that generated the translations. Furthermore, our prediction system is able to achieve the $2$nd best performance overall according to the official results of the Quality Estimation Task (QET) challenge when also looking at the translation outputs. Our representation and features achieve the top performance in QET among the models using the SVR learning model.
Metadata
Item Type:Article (Published)
Refereed:Yes
Uncontrolled Keywords:Statistical machine translation; Quality estimation; Machine learning; Performance prediction
Subjects:Computer Science > Computational linguistics
Computer Science > Machine translating
Computer Science > Machine learning
DCU Faculties and Centres:Research Institutes and Centres > Centre for Next Generation Localisation (CNGL)
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Publisher:Springer Netherlands
Official URL:http://dx.doi.org/10.1007/s10590-013-9138-4
Copyright Information:© 2013 Springer Verlag. The original publication is available at www.springerlink.com
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Science Foundation Ireland
ID Code:19283
Deposited On:20 Nov 2013 15:01 by Mehmet Ergun Bicici . Last Modified 16 Nov 2018 10:01
Documents

Full text available as:

[thumbnail of Labjam1.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
30kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record