Ontology-based document representation for biomedical information retrieval
Camous, Fabrice (2007) Ontology-based document representation for biomedical information retrieval. PhD thesis, Dublin City University.
Full text available as:
In the current era of fast sequencing of entire genomes, more data is becoming available for analysis. This data analysis, in turn, leads to an increasing amount of scientfic publications. Consequently, biologists spend a considerable part of their time searching the biomedical literature. This avoids expensive experiment duplications in wet labs, and provides inspiration for new hypotheses.
Unfortunately, the fast growth of biological information, in the form of free-text, has led to a lack of standard in the naming of biological entities. As a result, different genes are referred to with the same name, or acronym, and different names refer to tlze same gene. The ambiguity of free-text is problematic, as the success of a search often relies on the matching of a query term with a term contained in the document representation.
Biomedical ontologies, when available, can help disambiguate the information expressed in free-text: they provide unique terms to represent concepts and therefore counterweiglzt the occurrence of synonyms and polysems in free-text. They also contain information about the relationships between concepts. This information can be used to understand and evaluate semantic similarities between concepts.
The largest repository of biomedical research literature in the world, MEDLINE, is an entry point to biomedical information for most biologists (Hersh et al., 2004). The Medical Subject Headings (MeSH) is the controlled vocabulary used in MEDLINE to annotate the conceptual content of biomedical articles. The annotations include information about the importance of MeSH concepts in the article, and their contexts. The MeSH ontology is organized in several hierarchies that indicate the level of specificity of the MeSH concepts. This hierarchical information can be used to generate semantic similarities between concepts.
Our inotivation is the inzprovelnent of MEDLINE search, as it is still a central information access point for biologists in spite of the growing availability of full journal articles on the Web. In particular, we focus on the use of the MeSH ontology to represent and retrieve biomedical articles. Although MeSH is widely used by current MELDINE search methods, we show that the information contained in MEDLINE MeSH annotations and tlze MeSH hierarchies is often overlooked.
We hypothesize that MeSH-based document representation can ilzzprove MEDLINE information retrieval. Specifically, our hypothesis is that the integration of iliforniatioli about concept relevance (from the MEDLINE annotation), and interconcept similarities (from tlze MeSH hierarchies), will ilzzprove retrieval performance. We evaluate methods using such information to discriminate and compare MeSH concepts. Our methods are evaluated in the context of MEDLINE ad hoc document retrieval and document binary classifications. Our evaluatiolls use standard datasets and metrics recently used at the Genonzics track of the 2005 Text Retrieval Conference workshop.
Archive Staff Only: edit this record