Adapting and developing linguistic resources for question answering

Judge, John

Judge, John (2006) Adapting and developing linguistic resources for question answering. PhD thesis, Dublin City University.

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

As information retrieval becomes more focussed, so too must the techniques involved in the retrieval process. More precise responses to queries require more precise linguistic analysis of both the queries and the factual documents from which the information is being retrieved. In this thesis, I present research into using existing linguistic tools to analyse questions. These tools, as supplied, often underperform on question analysis. I present my work on adapting these tools, and creating new resources for use in developing new tools tailored to question analysis. My work has shown that in order to adapt the treebank- and f-structure annotation algorithmbased wide coverage LFG parsing resources of Cahill et al. (2004) to analyse questions from the ATIS corpus, only the c-structure parser needs to be retrained, the annotation algorithm remains unchanged. The retrained c-structure parser needs only a small amount of appropriate training data added to its training corpus to gain a significant improvement in both c-structure parsing and f-structure annotation. Given the improvements made with a relatively small amount of question data, I developed QuestionBank, a question treebank, to determine what further gains can be made using a larger amount of question data. My question treebank is a corpus of 4000 parse annotated questions. The questions were taken from a number of sources and the question treebank was “bootstrapped” in an incremental parsing, hand correction and retraining approach from raw data using existing probabilistic parsing resources. Experiments with QuestionBank show that it is an effective resource for training parsers to analyse questions with an improvement of over 10% on the baseline parsing results. In further experiments I show that a parser retrained with QuestionBank can also parse newspaper text (Penn-II Treebank Section 23) with state-of-the-art accuracy. Long distance dependencies (LDDs) are a vital part of question analysis in determining semantic roles and question focus. I have designed and implemented a novel method to recover WH-traces and coindexed antecedents in c-structure trees from parser output which uses the f-structure LDD resolution method of Cahill et al (2004) to resolve the dependencies and then “reverse engineers” the corresponding syntactic components in the c-structure tree.

Metadata

Item Type:	Thesis (PhD)
Date of Award:	2006
Refereed:	No
Supervisor(s):	van Genabith, Josef and Cahill, Aoife
Uncontrolled Keywords:	Question analysis
Subjects:	Computer Science > Computational linguistics Computer Science > Computer software Computer Science > Information retrieval
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License
ID Code:	17937
Deposited On:	24 Apr 2013 13:02 by INVALID USER. Last Modified 30 Jul 2019 10:23

Documents

Full text available as:

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
3MB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

Adapting and developing linguistic resources for question answering

Downloads