Arora, Piyush ORCID: 0000-0002-4261-2860, Ganguly, Debasis ORCID: 0000-0003-0050-7138 and Jones, Gareth J.F. ORCID: 0000-0003-2923-8365 (2016) Nearest neighbour based transformation functions for text classification: a case study with StackOverflow. In: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval, 12 - 16 Sept 2016, Newark, Delaware, USA. ISBN 978-1-4503-4497-5
Abstract
The significant growth in the number of questions in question answering forums has led to increasing interest in text categorization methods for classifying newly posted questions as good (suitable) or bad (otherwise) for the forum. Standard text categorization approaches, e.g. multinomial Naive Bayes, are likely to be unsuitable
for this classification task because of: i) the lack of sufficient informative content in the questions due to their relatively short length; and ii) considerable vocabulary overlap between the classes. To increase the robustness of this classification task, we propose to use the neighbourhood of existing questions which are similar to the
newly asked question. Instead of learning the classification boundary from the questions alone, we transform each question vector into a different one in the feature space. We explore two different neighbourhood functions using: the discrete term space, the continuous vector space of real numbers obtained from vector embeddings of documents. Experiments conducted on StackOverflow data show that our approach of using this neighborhood transformation can improve classification accuracy by up to about 8% as compared to using just unigram textual features.
Metadata
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Event Type: | Conference |
Refereed: | Yes |
Uncontrolled Keywords: | Neighbourhood based transformation; Document embedding; Question quality prediction |
Subjects: | UNSPECIFIED |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > ADAPT |
Published in: | Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval ICTIR '16. . ACM. ISBN 978-1-4503-4497-5 |
Publisher: | ACM |
Official URL: | http://dx.doi.org/10.1145/2970398.2970426 |
Copyright Information: | © 2016 ACM |
Funders: | Science Foundation Ireland (SFI) as a part of the ADAPT Centre at Dublin City University (Grant No: 12/CE/I2267 and 13/RC/2106). |
ID Code: | 22802 |
Deposited On: | 30 Nov 2018 16:19 by Piyush Arora . Last Modified 13 Mar 2019 13:55 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
946kB |
Metrics
Altmetric Badge
Dimensions Badge
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record