Browse DORAS
Browse Theses
Latest Additions
Creative Commons License
Except where otherwise noted, content on this site is licensed for use under a:

Query dependent pseudo-relevance feedback based on wikipedia

Xu, Yang and Jones, Gareth J.F. and Wang, Bin (2009) Query dependent pseudo-relevance feedback based on wikipedia. In: the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2009), 19-23 July 2009, Boston, MA.. ISBN 978-1-60558-483-6

Full text available as:

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader


Pseudo-relevance feedback (PRF) via query-expansion has been proven to be effective in many information retrieval (IR) tasks. In most existing work, the top-ranked documents from an initial search are assumed to be relevant and used for PRF. One problem with this approach is that one or more of the top retrieved documents may be non-relevant, which can introduce noise into the feedback process. Besides, ex- isting methods generally do not take into account the signicantly different types of queries that are often entered into an IR system. Intuitively, Wikipedia can be seen as a large, manually edited document collection which could be exploited to improve document retrieval effectiveness within PRF. It is not obvious how we might best utilize information from Wikipedia in PRF, and to date, the potential of Wikipedia for this task has been largely unexplored. In our work, we present a systematic exploration of the utilization of Wikipedia in PRF for query dependent expansion. Specifically, we classify TREC topics into three categories based on Wikipedia: 1) entity queries, 2) ambiguous queries, and 3) broader queries. We propose and study the effectiveness of three methods for expansion term selection, each modeling the Wikipedia based pseudo-relevance information from a different perspective. We incorporate the expansion terms into the original query and use language modeling IR to evaluate these methods. Experiments on four TREC test collections, including the large web collection GOV2, show that retrieval performance of each type of query can be improved. In addition, we demonstrate that the proposed method out-performs the baseline relevance model in terms of precision and robustness.

Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Uncontrolled Keywords:Information Retrieval; Entity; Query Expansion; Pseudorelevance feedback; Wikipedia
Subjects:Computer Science > Information retrieval
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Published in:Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. . Association for Computing Machinery. ISBN 978-1-60558-483-6
Publisher:Association for Computing Machinery
Official URL:
Copyright Information:© ACM, 2009. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval (2009)}
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
ID Code:16184
Deposited On:05 Aug 2011 14:28 by Shane Harper. Last Modified 05 Aug 2011 14:28

Download statistics

Archive Staff Only: edit this record