Towards using web-crawled data for domain adaptation in statistical machine translation
Pecina, Pavel, Toral, AntonioORCID: 0000-0003-2357-2960, Way, AndyORCID: 0000-0001-5736-5930, Papavassiliou, Vassilis, Prokopidis, Prokopis and Giagkou, Maria
(2011)
Towards using web-crawled data for domain adaptation in statistical machine translation.
In: The 15th conference of the European Association for Machine Translation (EAMT 2011), 30-31 May 2011, Leuven, Belgium, .
This paper reports on the ongoing work focused on domain adaptation of statistical machine translation using domain-specific data obtained by domain-focused web crawling. We present a strategy for crawling monolingual and parallel data and their exploitation for testing, language modelling, and system tuning in a phrase--based machine translation framework. The proposed approach is evaluated on the domains of Natural Environment and Labour Legislation and two language
pairs: English–French and English–Greek.