Extending the scope of out-of-domain: examining QA models in multiple subdomains

Lyu, Chenyang ORCID: 0009-0002-6733-5879, Foster, Jennifer ORCID: 0000-0002-7789-4853 and Graham, Yvette ORCID: 0000-0001-6741-4855 (2022) Extending the scope of out-of-domain: examining QA models in multiple subdomains. In: 3rd Workshop on Insights from Negative Results in NLP, Insights 2022, 26 May 2022, Dublin, Ireland. ISBN 978-195591740-7

[+][-]

Abstract

Past work that investigates out-of-domain performance of QA systems has mainly focused on general domains (e.g. news domain, wikipedia domain), underestimating the importance of subdomains defined by the internal characteristics of QA datasets. In this paper, we extend the scope of “out-of-domain” by splitting QA examples into different subdomains according to their internal characteristics including question type, text length, answer position. We then examine the performance of QA systems trained on the data from different subdomains. Experimental results show that the performance of QA systems can be significantly reduced when the train data and test data come from different subdomains. These results question the generalizability of current QA systems in multiple subdomains, suggesting the need to combat the bias introduced by the internal characteristics of QA datasets.

Metadata

Item Type:	Conference or Workshop Item (Paper)
Event Type:	Workshop
Refereed:	Yes
Uncontrolled Keywords:	Internal characteristics; News domain; Performance; QA system; Question type; Splittings; Subdomain; Test data; Text length; Wikipedia
Subjects:	UNSPECIFIED
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Published in:	Insights 2022 - 3rd Workshop on Insights from Negative Results in NLP, Proceedings of the Workshop. . Association for Computational Linguistics (ACL). ISBN 978-195591740-7
Publisher:	Association for Computational Linguistics (ACL)
Official URL:	https://www.scopus.com/inward/record.uri?partnerID...
Copyright Information:	© 2022 Association for Computational Linguistics
Funders:	Science Foundation Ireland, SFI Centre for Research Training in Machine Learning (18/CRT/6183).
ID Code:	29136
Deposited On:	18 Oct 2023 11:39 by Vidatum Academic . Last Modified 18 Oct 2023 11:39

Documents

Full text available as:

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution-Noncommercial-Share Alike 4.0
401kB

Metrics

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

Extending the scope of out-of-domain: examining QA models in multiple subdomains

Altmetric Badge

Dimensions Badge

Downloads