An algorithm for cross-lingual sense-clustering tested in a MT evaluation setting
Apidianaki, Marianna and He, Yifan
(2010)
An algorithm for cross-lingual sense-clustering tested in a MT evaluation setting.
In: The 7th International Workshop on Spoken Language Translation (IWSLT 2010), 2-3 December, Paris, France.
Unsupervised sense induction methods offer a solution to the
problem of scarcity of semantic resources. These methods
automatically extract semantic information from textual data
and create resources adapted to specific applications and domains of interest. In this paper, we present a clustering algorithm for cross-lingual sense induction which generates
bilingual semantic inventories from parallel corpora. We describe the clustering procedure and the obtained resources. We then proceed to a large-scale evaluation by integrating the resources into a Machine Translation (MT) metric (METEOR). We show that the use of the data-driven sense-cluster inventories leads to better correlation with human judgments of translation quality, compared to precision-based metrics, and to improvements similar to those obtained when a handcrafted semantic resource is used.