Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

QuestionBank: creating a corpus of parse-annotated questions

Judge, John, Cahill, Aoife orcid logoORCID: 0000-0002-3519-7726 and van Genabith, Josef (2006) QuestionBank: creating a corpus of parse-annotated questions. In: COLING/ACL 2006 - 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, 17-21 July 2006, Sydney, Australia.

Abstract
This paper describes the development of QuestionBank, a corpus of 4000 parse-annotated questions for (i) use in training parsers employed in QA, and (ii) evaluation of question parsing. We present a series of experiments to investigate the effectiveness of QuestionBank as both an exclusive and supplementary training resource for a state-of-the-art parser in parsing both question and non-question test sets. We introduce a new method for recovering empty nodes and their antecedents (capturing long distance dependencies) from parser output in CFG trees using LFG f-structure reentrancies. Our main findings are (i) using QuestionBank training data improves parser performance to 89.75% labelled bracketing f-score, an increase of almost 11% over the baseline; (ii) back-testing experiments on non-question data (Penn-II WSJ Section 23) shows that the retrained parser does not suffer a performance drop on non-question material; (iii) ablation experiments show that the size of training material provided by QuestionBank is sufficient to achieve optimal results; (iv) our method for recovering empty nodes captures long distance dependencies in questions from the ATIS corpus with high precision (96.82%) and low recall (39.38%). In summary, QuestionBank provides a useful new resource in parser-based QA research.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:Parse-annotated questions;
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:Research Institutes and Centres > National Centre for Language Technology (NCLT)
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Publisher:Association for Computational Linguistics
Official URL:http://www.aclweb.org/anthology/P/P06/
Copyright Information:© 2006 Association for Computational Linguistics
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Science Foundation Ireland, SFI 04/BR/CS0370, Irish Research Council for Science Engineering and Technology
ID Code:15271
Deposited On:10 Mar 2010 13:48 by DORAS Administrator . Last Modified 25 Jan 2019 11:51
Documents

Full text available as:

[thumbnail of judge_et_al_06.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
205kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record