Browse DORAS
Browse Theses
Latest Additions
Creative Commons License
Except where otherwise noted, content on this site is licensed for use under a:

An investigation of decompounding for cross-language patent search

Leveling, Johannes and Magdy, Walid and Jones, Gareth J.F. (2011) An investigation of decompounding for cross-language patent search. In: The 34th Annual ACM SIGIR Conference, 24-28 Jul 2011, Beijing, China.

Full text available as:

[img]PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader


Decompounding has been found to improve information retrieval (IR) effectiveness in general domains for languages such as German or Dutch. We investigate if cross-language patent retrieval can profit from decompounding. This poses two challenges: i) There may be few resources such as parallel corpora available for training an machine translation system for a compounding language. ii) Patents have a specific writing style and vocabulary (“patentese”), which may affect the performance of decompounding and translation methods. Experiments on data from the CLEF-IP 2010 task show that decompounding patents for translation can overcome out-of-vocabulary problems (OOV) and that decompounding improves IR performance significantly for small training corpora.

Item Type:Conference or Workshop Item (Poster)
Event Type:Conference
Uncontrolled Keywords:Experimentation; Performance; Measurement; Patent Retrieval; Decompounding
Subjects:Computer Science > Information retrieval
DCU Faculties and Centres:Research Initiatives and Centres > Centre for Next Generation Localisation (CNGL)
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
ID Code:16447
Deposited On:25 Jul 2011 14:10 by Shane Harper. Last Modified 25 Jul 2011 14:10

Download statistics

Archive Staff Only: edit this record