Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Nearest neighbour based transformation functions for text classification: a case study with StackOverflow

Arora, Piyush orcid logoORCID: 0000-0002-4261-2860, Ganguly, Debasis orcid logoORCID: 0000-0003-0050-7138 and Jones, Gareth J.F. orcid logoORCID: 0000-0003-2923-8365 (2016) Nearest neighbour based transformation functions for text classification: a case study with StackOverflow. In: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval, 12 - 16 Sept 2016, Newark, Delaware, USA. ISBN 978-1-4503-4497-5

Abstract
The significant growth in the number of questions in question answering forums has led to increasing interest in text categorization methods for classifying newly posted questions as good (suitable) or bad (otherwise) for the forum. Standard text categorization approaches, e.g. multinomial Naive Bayes, are likely to be unsuitable for this classification task because of: i) the lack of sufficient informative content in the questions due to their relatively short length; and ii) considerable vocabulary overlap between the classes. To increase the robustness of this classification task, we propose to use the neighbourhood of existing questions which are similar to the newly asked question. Instead of learning the classification boundary from the questions alone, we transform each question vector into a different one in the feature space. We explore two different neighbourhood functions using: the discrete term space, the continuous vector space of real numbers obtained from vector embeddings of documents. Experiments conducted on StackOverflow data show that our approach of using this neighborhood transformation can improve classification accuracy by up to about 8% as compared to using just unigram textual features.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:Neighbourhood based transformation; Document embedding; Question quality prediction
Subjects:UNSPECIFIED
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Institutes and Centres > ADAPT
Published in: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval ICTIR '16. . ACM. ISBN 978-1-4503-4497-5
Publisher:ACM
Official URL:http://dx.doi.org/10.1145/2970398.2970426
Copyright Information:© 2016 ACM
Funders:Science Foundation Ireland (SFI) as a part of the ADAPT Centre at Dublin City University (Grant No: 12/CE/I2267 and 13/RC/2106).
ID Code:22802
Deposited On:30 Nov 2018 16:19 by Piyush Arora . Last Modified 13 Mar 2019 13:55
Documents

Full text available as:

[thumbnail of p299-arora.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
946kB
Metrics

Altmetric Badge

Dimensions Badge

Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record