Browse DORAS
Browse Theses
Search
Latest Additions
Creative Commons License
Except where otherwise noted, content on this site is licensed for use under a:

Index ordering by query-independent measures

Ferguson, Paul and Smeaton, Alan F. (2012) Index ordering by query-independent measures. Information Processing & Management, 48 (3). pp. 569-586. ISSN 0306-4573

Full text available as:

[img]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
276Kb

Abstract

Conventional approaches to information retrieval search through all applicable entries in an inverted file for a particular collection in order to find those documents with the highest scores. For particularly large collections this may be extremely time consuming. A solution to this problem is to only search a limited amount of the collection at query-time, in order to speed up the retrieval process. In doing this we can also limit the loss in retrieval efficacy (in terms of accuracy of results). The way we achieve this is to firstly identify the most “important” documents within the collection, and sort documents within inverted file lists in order of this “importance”. In this way we limit the amount of information to be searched at query time by eliminating documents of lesser importance, which not only makes the search more efficient, but also limits loss in retrieval accuracy. Our experiments, carried out on the TREC Terabyte collection, report significant savings, in terms of number of postings examined, without significant loss of effectiveness when based on several measures of importance used in isolation, and in combination. Our results point to several ways in which the computation cost of searching large collections of documents can be significantly reduced.

Item Type:Article (Published)
Refereed:Yes
Additional Information:alan.smeaton@dcu.ie for further information
Uncontrolled Keywords:Text retrieval; Indexing; efficiency/effectiveness tradeoffs; query-independent search; Query-independent search
Subjects:Computer Science > Information storage and retrieval systems
Computer Science > Information retrieval
Computer Science > Algorithms
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Initiatives and Centres > CLARITY: The Centre for Sensor Web Technologies
Publisher:Elsevier
Official URL:http://dx.doi.org/10.1016/j.ipm.2011.10.003
Copyright Information:© 2012 Elsevier
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Science Foundation Ireland
ID Code:17137
Deposited On:16 Jul 2012 11:05 by Alan F. Smeaton. Last Modified 16 Jul 2012 11:05

Download statistics

Archive Staff Only: edit this record