DCU-Symantec submission for the WMT 2012 quality estimation task

Rubino, Raphael; Foster, Jennifer; Wagner, Joachim; Roturier, Johann; Samad Zadeh Kaljahi, Rasoul; Hollowood, Fred

Rubino, Raphael, Foster, Jennifer ORCID: 0000-0002-7789-4853, Wagner, Joachim ORCID: 0000-0002-8290-3849, Roturier, Johann, Samad Zadeh Kaljahi, Rasoul and Hollowood, Fred (2012) DCU-Symantec submission for the WMT 2012 quality estimation task. In: The NAACL 2012 Seventh Workshop on Statistical Machine Translation (WMT'12), 7-8 Jun 2012, Montreal, Quebec, Canada.

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

This paper describes the features and the machine learning methods used by Dublin City University (DCU) and SYMANTEC for the WMT 2012 quality estimation task. Two sets of features are proposed: one constrained, i.e. respecting the data limitation suggested by the workshop organisers, and one unconstrained, i.e. using data or tools trained on data that was not provided by the workshop organisers. In total, more than 300 features were extracted and used to train classifiers in order to predict the translation quality of unseen data. In this paper, we focus on a subset of our feature set that we consider to be relatively novel: features based on a topic model built using the Latent Dirichlet Allocation approach, and features based on source and target language syntax extracted using part-of-speech (POS) taggers and parsers. We evaluate nine feature combinations using four classification-based and four regression-based machine learning techniques.

Metadata

Item Type:	Conference or Workshop Item (Paper)
Event Type:	Conference
Refereed:	Yes
Uncontrolled Keywords:	quality estimation; ranking of machine translation output
Subjects:	Computer Science > Computational linguistics Computer Science > Machine translating
DCU Faculties and Centres:	Research Institutes and Centres > Centre for Next Generation Localisation (CNGL) DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:	Science Foundation Ireland, Irish Research Council for Science Engineering and Technology, Symantec
ID Code:	17050
Deposited On:	08 Jun 2012 14:35 by Joachim Wagner . Last Modified 10 Oct 2018 14:36

Documents

Full text available as:

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
149kB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

DCU-Symantec submission for the WMT 2012 quality estimation task

Downloads