Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Investigating query expansion and coreference resolution in question answering on BERT

Bhattacharjee, Santanu, Haque, Rejwanul orcid logoORCID: 0000-0003-1680-0099, Maillette de Buy Wenniger, Gideon and Way, Andy orcid logoORCID: 0000-0001-5736-5930 (2020) Investigating query expansion and coreference resolution in question answering on BERT. In: 25th International Conference on Natural Language & Information Systems (NLDB 2020)), 24 - 26 June 2020, Saarbrücken, Germany (Online). ISBN 978-3-030-51309-2

Abstract
The Bidirectional Encoder Representations from Transformers (BERT) model produces state-of-the-art results in many question answering (QA) datasets, including the Stanford Question Answering Dataset (SQuAD). This paper presents a query expansion (QE) method that identifies good terms from input questions, extracts synonyms for the good terms using a widely-used language resource, WordNet, and selects the most relevant synonyms from the list of extracted synonyms. The paper also introduces a novel QE method that produces many alternative sequences for a given input question using same-language machine translation (MT). Furthermore, we use a coreference resolution (CR) technique to identify anaphors or cataphors in paragraphs and substitute them with the original referents. We found that the QA system with this simple CR technique significantly outperforms the BERT baseline in a QA task. We also found that our best-performing QA system is the one that applies these three preprocessing methods (two QE and CR methods) together to BERT, which produces an excellent F 1 score (89.8 F1 points) in a QA task. Further, we present a comparative analysis on the performances of the BERT QA models taking a variety of criteria into account, and demonstrate our findings in the answer span prediction task.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:query expansion; coreference resolution; question answering; information retrieval; machine translation; neural machine translation
Subjects:Computer Science > Computational linguistics
Computer Science > Computer engineering
Computer Science > Information retrieval
Computer Science > Machine learning
Computer Science > Machine translating
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Published in: Natural Language Processing and Information Systems, NLDB 2020. Lecture Notes in Computer Science 12089. Springer. ISBN 978-3-030-51309-2
Publisher:Springer
Official URL:http://dx.doi.org/10.1007/978-3-030-51310-8_5
Copyright Information:© 2020 The Authors
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Science Foundation Ireland (SFI) Research Centres Programme (Grant No. 13/RC/2106), European Regional Development Fund, European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 713567, Science Foundation Ireland (SFI) Grant Number 13/RC/2077
ID Code:24561
Deposited On:25 Jun 2020 16:16 by Rejwanul Haque . Last Modified 25 Jun 2020 16:16
Documents

Full text available as:

[thumbnail of NLDB_QA_Paper.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
287kB
Metrics

Altmetric Badge

Dimensions Badge

Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record