Judge, John, Cahill, Aoife ORCID: 0000-0002-3519-7726 and van Genabith, Josef (2006) QuestionBank: creating a corpus of parse-annotated questions. In: COLING/ACL 2006 - 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, 17-21 July 2006, Sydney, Australia.
Abstract
This paper describes the development of QuestionBank, a corpus of 4000 parse-annotated questions for (i) use in training parsers employed in QA, and (ii) evaluation
of question parsing. We present a series of experiments to investigate the effectiveness of QuestionBank as both an
exclusive and supplementary training resource for a state-of-the-art parser in parsing both question and non-question test sets. We introduce a new method for recovering empty nodes and their antecedents (capturing long distance dependencies) from parser output in CFG trees using LFG f-structure reentrancies. Our main findings are (i) using QuestionBank training data improves parser performance to 89.75% labelled bracketing f-score, an increase of almost 11% over the baseline; (ii) back-testing experiments on non-question data (Penn-II WSJ Section 23) shows that the retrained parser does not suffer a performance drop on non-question material; (iii) ablation experiments show that the size of training material provided by QuestionBank is sufficient to achieve optimal results; (iv) our method for recovering empty nodes captures long distance dependencies in questions from the ATIS corpus with high precision (96.82%) and low recall (39.38%). In summary, QuestionBank
provides a useful new resource in parser-based QA research.
Metadata
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Event Type: | Conference |
Refereed: | Yes |
Uncontrolled Keywords: | Parse-annotated questions; |
Subjects: | Computer Science > Machine translating |
DCU Faculties and Centres: | Research Institutes and Centres > National Centre for Language Technology (NCLT) DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing |
Publisher: | Association for Computational Linguistics |
Official URL: | http://www.aclweb.org/anthology/P/P06/ |
Copyright Information: | © 2006 Association for Computational Linguistics |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License |
Funders: | Science Foundation Ireland, SFI 04/BR/CS0370, Irish Research Council for Science Engineering and Technology |
ID Code: | 15271 |
Deposited On: | 10 Mar 2010 13:48 by DORAS Administrator . Last Modified 25 Jan 2019 11:51 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
205kB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record