Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Query expansion for language modeling using sentence similarities

Ganguly, Debasis orcid logoORCID: 0000-0003-0050-7138, Leveling, Johannes orcid logoORCID: 0000-0003-0603-4191 and Jones, Gareth J.F. orcid logoORCID: 0000-0003-2923-8365 (2011) Query expansion for language modeling using sentence similarities. In: The 2nd Information Retrieval Facility (IRF) Conference, 6th June 2011, Vienna, Austria.

Abstract
We propose a novel method of query expansion for Language Modeling (LM) in Information Retrieval (IR) based on the similarity of the query with sentences in the top ranked documents from an initial retrieval run. In justification of our approach, we argue that the terms in the expanded query obtained by the proposed method roughly follow a Dirichlet distribution which, being the conjugate prior of the multinomial distribution used in the LM retrieval model, helps the feedback step. IR experiments on the TREC ad-hoc retrieval test collections using the sentence based query expansion (SBQE) show a significant increase in Mean Average Precision (MAP) compared to baselines obtained using standard term-based query expansion using LM selection score and the Relevance Model (RLM). The proposed approach to query expansion for LM increases the likelihood of generation of the pseudo-relevant documents by adding sentences with maximum term overlap with the query sentences for each top ranked pseudorelevant document thus making the query look more like these documents. A per topic analysis shows that the new method hurts less queries compared to the baseline feedback methods, and improves average precision (AP) over a broad range of queries ranging from easy to difficult in terms of the initial retrieval AP. We also show that the new method is able to add a higher number of good feedback terms (the golden standard of good terms being the set of terms added by True Relevance Feedback). Additional experiments on the challenging search topics of the TREC-2004 Robust track show that the new method is able to improve MAP by 5.7% without the use of external resources and query hardness prediction typically used for these topics.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:sentence based query expansion; SBQE; Mean Average Precision; MAP; Relevance Model; RLM
Subjects:Computer Science > Information retrieval
DCU Faculties and Centres:Research Institutes and Centres > Centre for Next Generation Localisation (CNGL)
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
ID Code:16391
Deposited On:29 Jun 2011 13:50 by Shane Harper . Last Modified 25 Oct 2018 10:26
Documents

Full text available as:

[thumbnail of Query_Expansion_for_Language_Modeling_using_Sentence_Similarities.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
306kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record