Gurrin, Cathal ORCID: 0000-0003-4395-7702 (2002) Evaluation of linkage-based web discovery systems. PhD thesis, Dublin City University.
Abstract
In recent years, the widespread use of the WWW has brought information retrieval systems into the homes o f many millions people. Today, we have access to many billions o f documents (web pages) and have (free-of-charge) access to powerful, fast and highly efficient search facilities over these documents provided by search engines such as Google. The "first generation" of web search engines addressed the engineering problems o f web spidering and efficient searching for large numbers o f both users and documents, but they did not innovate much in the approaches taken to searching.
Recently, however, linkage analysis has been incorporated into search engine ranking strategies. Anecdotally, linkage analysis appears to have improved retrieval effectiveness o f web search, yet there is little scientific evidence in support o f the claims for better quality retrieval, which is surprising. Participants in the three most recent TREC conferences (1999, 2000 and 2001) have been invited to perform benchmarking o f information retrieval systems on web data and have had the option o f using linkage information as part of their retrieval strategies. The general consensus from the experiments of these participants is that linkage information has not yet been successfully incorporated into conventional retrieval strategies.
In this thesis, we present our research into the field o f linkage-based retrieval of web documents. We illustrate that (moderate) improvements in retrieval performance is possible if the undedying test collection contains a higher link density than the test collections used in the three most recent TREC conferences. We examine the linkage structure o f live data from the WWW and coupled with our findings from crawling sections o f the WWW we present a list o f five requirements for a test collection which is to faithfully support experiments into linkage-based retrieval o f documents from the WWW. We also present some o f our own, new, vanants on linkage-based web retrieval and evaluate their performance in comparison to the approaches o f others.
Metadata
Item Type: | Thesis (PhD) |
---|---|
Date of Award: | 2002 |
Refereed: | No |
Supervisor(s): | Smeaton, Alan F. |
Uncontrolled Keywords: | linkage-based web retrieval; search engines; retrieval strategies |
Subjects: | Computer Science > Information storage and retrieval systems Computer Science > Information retrieval |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License |
ID Code: | 17243 |
Deposited On: | 21 Aug 2012 14:53 by Fran Callaghan . Last Modified 02 Nov 2018 15:52 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
9MB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record