Bacha, Soufiane, Ning, Huansheng, Belarbi, Mostefa, Sarwatt, Doreen Sebastian and Dhelim, Sahraoui
ORCID: 0000-0002-3620-1395
(2025)
A novel double pruning method for imbalanced data using information entropy and Roulette wheel selection for breast cancer diagnosis.
Knowledge-Based Systems, 330
.
p. 114403.
ISSN 0950-7051
Abstract
Accurate illness diagnosis is vital for effective treatment and patient safety. Conventional machine learning models are built on the assumption of balanced medical data to perform cancer diagnoses. However, class imbalance remains a crucial challenge that adversely affects the classifier’s performance and reliability, while the existing ensemble solutions are still prone to noisy data and tend to overlook overlaps near decision boundaries. This paper proposes RE-SMOTEBoost, a double-pruning version of the basic ensemble SMOTEBoost method, designed to overcome these drawbacks. First, the proposed method focuses on generating synthetic samples in overlapping regions to better capture the decision boundary by employing roulette wheel selection. Second, it integrates an entropy filter to reduce noisy data and borderline cases, thereby improving the quality of the generated samples. Third, we propose a double regularization penalty to control the proximity of synthetic samples to the decision boundary and prevent the creation of new overlapping samples. These enhancements enable higher-quality oversampling samples, yielding a more balanced training dataset. Experimental findings demonstrated that the proposed method outperforms state-of-the-art methods, achieving a 3.22 improvement in accuracy and an 88.8 reduction in variance compared to the best-performing methods. Practically, the proposed model provides a robust solution for medical applications, handling data scarcity and imbalance arising from data collection difficulties and privacy constraints.
Metadata
| Item Type: | Article (Published) |
|---|---|
| Refereed: | Yes |
| Uncontrolled Keywords: | Imbalanced data, Cancer data, Information entropy, Class overlapping |
| Subjects: | Computer Science > Artificial intelligence Computer Science > Information technology Computer Science > Machine learning |
| DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing |
| Publisher: | Elsevier |
| Official URL: | https://www.sciencedirect.com/science/article/pii/... |
| Copyright Information: | Authors |
| ID Code: | 32443 |
| Deposited On: | 23 Mar 2026 09:41 by Sahraoui Dhelim . Last Modified 23 Mar 2026 09:41 |
Documents
Full text available as:
|
PDF
- Archive staff only. This file is embargoed until 25 November 2027
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution-Noncommercial-No Derivative Works 4.0 1MB |
Metrics
Altmetric Badge
Dimensions Badge
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record