On the use of clustering and the MeSH controlled vocabulary to improve MEDLINE abstract search
Blott, Stephen, Camous, Fabrice, Gurrin, CathalORCID: 0000-0003-4395-7702 and Jones, Gareth J.F.ORCID: 0000-0003-2923-8365
(2005)
On the use of clustering and the MeSH controlled vocabulary to improve MEDLINE abstract search.
In: the Second CORIA (Conference en Recherche d'Informations et Applications), March 2005, Grenoble, France.
Databases of genomic documents contain substantial amounts of structured information in addition to the texts of titles and abstracts. Unstructured information retrieval techniques fail to take advantage of the structured information available. This paper describes a technique to
improve upon traditional retrieval methods by clustering the retrieval result set into two distinct clusters using additional structural information. Our hypothesis is that the relevant documents are to be found in the tightest cluster of the two, as suggested by van Rijsbergen's cluster
hypothesis. We present an experimental evaluation of these ideas based on the relevance judgments of the 2004 TREC workshop Genomics track, and the CLUTO software clustering
package.
Metadata
Item Type:
Conference or Workshop Item (Paper)
Event Type:
Conference
Refereed:
Yes
Uncontrolled Keywords:
Genomic information retrieval; clustering; ontology; tree similarity measure