An investigation of decompounding for cross-language patent search
Leveling, Johannes and Magdy, Walid and Jones, Gareth J.F. (2011) An investigation of decompounding for cross-language patent search. In: The 34th Annual ACM SIGIR Conference, 24-28 Jul 2011, Beijing, China.
Full text available as:
Decompounding has been found to improve information retrieval (IR) effectiveness in general domains for languages such as German or Dutch. We investigate if cross-language patent retrieval can profit from decompounding. This poses two challenges: i) There may be few resources such as parallel corpora available for training an machine translation system for a compounding language. ii) Patents have a specific writing style and vocabulary (“patentese”),
which may affect the performance of decompounding and
translation methods. Experiments on data from the CLEF-IP 2010 task show that decompounding patents for translation can overcome out-of-vocabulary problems (OOV) and that decompounding improves IR performance significantly for small training corpora.
Archive Staff Only: edit this record