A semantic-based approach to information processing
Richardson, Ray (1994) A semantic-based approach to information processing. PhD thesis, Dublin City University.
Full text available as:
The research reported in this thesis is centred around the development of a semantic based approach to information processing. Traditional word-based pattern matching approaches to information processing suffer from both the richness and ambiguousness of natural language. Although retrieval performances of traditional systems can be satisfactory in many situations, it is commonly held that the traditional approach has reached the peak of its potential and any substantial improvements will be very difficult to achieve, [Smea91], Word-based pattern matching retrieval systems are devoid of the semantic power necessary to either distinguish between different senses of homonyms or identity the similar meanings of related terms. Our proposed semantic information processing system was designed to tackle these problems among others, (we also wanted to allow phrasal as well as single word terms to describe concepts). Our prototype system is comprised of a WordNet derived domain independent knowledge base (KB) and a concept level semantic similarity estimator. The KB, which is rich in noun phrases, is used as a controlled vocabulary which effectively addresses many of the problems posed by ambiguities in natural language. Similarly both proposals for the semantic similarity estimator tackle issues regarding the richness of natural language and in particular the multitude of ways of expressing the same concept.
A semantic based document retrieval system is developed as a means of evaluating our approach. However, many other information processing applications are discussed with particular attention directed towards the application of our approach to locating and relating information in a large scale Federated Database System (FDBS). The document retrieval evaluation application operates by obtaining KB representations of both the documents and queries and using the semantic similarity estimators as the comparison mechanism in the procedure to determine the degree of relevance of a document for a query. The construction of KB representations for documents and queries is a completely automatic procedure, and among other steps includes a sense disambiguation phase. The sense disambiguator developed for this research also represents a departure from existing approaches to sense disambiguation. In our approach four individual disambiguation mechanisms are used to individually weight different senses of ambiguous terms. This allows the possibility of there being more than one correct sense.
Our evaluation mechanism employs the Wall Street Journal text corpus and a set of TREC queries along with their relevance assessments in an ovrall document retrieval application. A traditional pattern matching tPIDF system is used as a baseline system in our evaluation experiments. The results indicate firstly that our WordNet derived KB is capable of being used as a controlled vocabulary and secondly that our approaches to estimating semantic similarity operate well at their intended concept level. However, it is more difficult to arrive at conclusive interpretations of the results with regard to the application of our semantic based systems to the complex task of document retrieval. A more complete evaluation is left as a topic for future research.
Archive Staff Only: edit this record