Categorising Corruption in the Vaccine Discourse: A General Taxonomy, Data Set, and Evaluation of LLMs for Classifying Corruption Dialogue in Social Media

dos Santos, Vitor Gaboardi; Santos, Guto Leoni; Egli, Antonia; Kahvazadeh, Estatira; Doolin, Bill; Endo, Patricia Takako; Lynn, Theo

Home
Browse By

Author

DCU Faculties and Centres

Theses

Subject

Year

Publication Type

Year of Award

Supervisors
About / FAQ
Statistics
Login (DCU Staff Only)

Categorising Corruption in the Vaccine Discourse: A General Taxonomy, Data Set, and Evaluation of LLMs for Classifying Corruption Dialogue in Social Media

dos Santos, Vitor Gaboardi, Santos, Guto Leoni ORCID: 0000-0002-0257-4214, Egli, Antonia ORCID: 0000-0002-0151-0884, Kahvazadeh, Estatira, Doolin, Bill, Endo, Patricia Takako ORCID: 0000-0002-9163-5583 and Lynn, Theo ORCID: 0000-0001-9284-7580 (2024) Categorising Corruption in the Vaccine Discourse: A General Taxonomy, Data Set, and Evaluation of LLMs for Classifying Corruption Dialogue in Social Media. In: International Conference on Advances in Social Networks Analysis and Mining. ISBN 978-3-031-78541-2

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

Real or perceived corruption can have a damaging effect on health care services and outcomes. In particular, research suggests perceived corruption had a significant impact on COVID-19 vaccination. Given the role of social media in health communications, identifying and understanding perceived corruption related to vaccines and vaccination is critical to build societal cohesion and public trust in health institutions and strategies, manage and combat misinformation and disinformation, and design more effective policies, interventions, and communications strategies. There is a dearth of research on binary and multi-class classification of corruption dialogues in health or otherwise. We address this gap by introducing a general hierarchical corruption dialogue taxonomy (HCDT) and formulating binary and multi-class classification tasks based on the HCDT. We also create a vaccine-specific labelled dataset for each task, and fine-tune three large language models (BERT, RoBERTa, and BERTweet) based on these datasets. We evaluate the performance of these models in the binary and multi-class classification tasks. While all models performed similarly for the binary task, RoBERTa performed best for multi-class classification of corruption dialogue.

Metadata

Item Type:	Conference or Workshop Item (Paper)
Event Type:	Conference
Refereed:	Yes
Uncontrolled Keywords:	Corruption, Large Language Models, BERT, Twitter, multi-class classification, vaccine, COVID-19
Subjects:	Computer Science > World Wide Web Social Sciences > Globalization
DCU Faculties and Centres:	DCU Faculties and Schools > DCU Business School
Published in:	Social Networks Analysis and Mining. ASONAM 2024. Lecture Notes in Computer Science 15211. Springer, Cham. ISBN 978-3-031-78541-2
Publisher:	Springer, Cham
Official URL:	https://link.springer.com/chapter/10.1007/978-3-03...
Copyright Information:	Authors
ID Code:	32857
Deposited On:	02 Jul 2026 10:56 by Tam Nguyen . Last Modified 02 Jul 2026 10:57

Documents

Full text available as:

[thumbnail of Categorising Corruption in the Vaccine Discourse.pdf]

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution 4.0
764kB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

Categorising Corruption in the Vaccine Discourse: A General Taxonomy, Data Set, and Evaluation of LLMs for Classifying Corruption Dialogue in Social Media

Downloads