Evaluation of linkage-based web discovery systems
Gurrin, Cathal (2002) Evaluation of linkage-based web discovery systems. PhD thesis, Dublin City University.
Full text available as:
In recent years, the widespread use of the WWW has brought information retrieval systems into the homes o f many millions people. Today, we have access to many billions o f documents (web pages) and have (free-of-charge) access to powerful, fast and highly efficient search facilities over these documents provided by search engines such as Google. The "first generation" of web search engines addressed the engineering problems o f web spidering and efficient searching for large numbers o f both users and documents, but they did not innovate much in the approaches taken to searching.
Recently, however, linkage analysis has been incorporated into search engine ranking strategies. Anecdotally, linkage analysis appears to have improved retrieval effectiveness o f web search, yet there is little scientific evidence in support o f the claims for better quality retrieval, which is surprising. Participants in the three most recent TREC conferences (1999, 2000 and 2001) have been invited to perform benchmarking o f information retrieval systems on web data and have had the option o f using linkage information as part of their retrieval strategies. The general consensus from the experiments of these participants is that linkage information has not yet been successfully incorporated into conventional retrieval strategies.
In this thesis, we present our research into the field o f linkage-based retrieval of web documents. We illustrate that (moderate) improvements in retrieval performance is possible if the undedying test collection contains a higher link density than the test collections used in the three most recent TREC conferences. We examine the linkage structure o f live data from the WWW and coupled with our findings from crawling sections o f the WWW we present a list o f five requirements for a test collection which is to faithfully support experiments into linkage-based retrieval o f documents from the WWW. We also present some o f our own, new, vanants on linkage-based web retrieval and evaluate their performance in comparison to the approaches o f others.
Archive Staff Only: edit this record