Browse DORAS
Browse Theses
Search
Latest Additions
Creative Commons License
Except where otherwise noted, content on this site is licensed for use under a:

Ontology-based document representation for biomedical information retrieval

Camous, Fabrice (2007) Ontology-based document representation for biomedical information retrieval. PhD thesis, Dublin City University.

Full text available as:

[img]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
13Mb

Abstract

In the current era of fast sequencing of entire genomes, more data is becoming available for analysis. This data analysis, in turn, leads to an increasing amount of scientfic publications. Consequently, biologists spend a considerable part of their time searching the biomedical literature. This avoids expensive experiment duplications in wet labs, and provides inspiration for new hypotheses. Unfortunately, the fast growth of biological information, in the form of free-text, has led to a lack of standard in the naming of biological entities. As a result, different genes are referred to with the same name, or acronym, and different names refer to tlze same gene. The ambiguity of free-text is problematic, as the success of a search often relies on the matching of a query term with a term contained in the document representation. Biomedical ontologies, when available, can help disambiguate the information expressed in free-text: they provide unique terms to represent concepts and therefore counterweiglzt the occurrence of synonyms and polysems in free-text. They also contain information about the relationships between concepts. This information can be used to understand and evaluate semantic similarities between concepts. The largest repository of biomedical research literature in the world, MEDLINE, is an entry point to biomedical information for most biologists (Hersh et al., 2004). The Medical Subject Headings (MeSH) is the controlled vocabulary used in MEDLINE to annotate the conceptual content of biomedical articles. The annotations include information about the importance of MeSH concepts in the article, and their contexts. The MeSH ontology is organized in several hierarchies that indicate the level of specificity of the MeSH concepts. This hierarchical information can be used to generate semantic similarities between concepts. Our inotivation is the inzprovelnent of MEDLINE search, as it is still a central information access point for biologists in spite of the growing availability of full journal articles on the Web. In particular, we focus on the use of the MeSH ontology to represent and retrieve biomedical articles. Although MeSH is widely used by current MELDINE search methods, we show that the information contained in MEDLINE MeSH annotations and tlze MeSH hierarchies is often overlooked. We hypothesize that MeSH-based document representation can ilzzprove MEDLINE information retrieval. Specifically, our hypothesis is that the integration of iliforniatioli about concept relevance (from the MEDLINE annotation), and interconcept similarities (from tlze MeSH hierarchies), will ilzzprove retrieval performance. We evaluate methods using such information to discriminate and compare MeSH concepts. Our methods are evaluated in the context of MEDLINE ad hoc document retrieval and document binary classifications. Our evaluatiolls use standard datasets and metrics recently used at the Genonzics track of the 2005 Text Retrieval Conference workshop.

Item Type:Thesis (PhD)
Date of Award:2007
Refereed:No
Supervisor(s):Blott, Stephen and Smeaton, Alan F.
Uncontrolled Keywords:controlled vocabulary; Biomedical ontologies; concept relevance; interconcept similarities
Subjects:Computer Science > Information retrieval
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License
ID Code:16965
Deposited On:10 May 2012 10:56 by Fran Callaghan. Last Modified 10 May 2012 10:56

Download statistics

Archive Staff Only: edit this record