Multi-label imbalanced text handling using ensemble methodology with application to biomedical data classification

Polprasert, Chantri

doi:10.1007/s42044-025-00332-x

Multi-label imbalanced text handling using ensemble methodology with application to biomedical data classification

Source

Iran Journal of Computer Science

ISSN

25208438

Date Issued

2025-01-01

Author(s)

Ghosh, Subhajit

Gupta, Sanidhya

Bhattacharyya, Sourav

Das, Avik Kumar

Nandi, Apurba

Sarkar, Ardhendu

Samanta, Partha Sarathi

Polprasert, Chantri

DOI

10.1007/s42044-025-00332-x

Abstract

The surge in biomedical literature and clinical reports presents a formidable challenge for automated text analysis, particularly in multi-label classification tasks where severe class imbalance and interdependent labels are common. To address these issues, we propose MITHEM (Multi-label Imbalance-aware Text Classification using Hybrid Ensemble Model), an ensemble framework that combines threshold-guided binning, SMOTE based oversampling, and a set of diverse classifiers Support Vector Machines, Decision Trees, and Random Forests within a meta-classification approach. Unlike traditional techniques, MITHEM not only improves the representation of minority classes but also learns correlations between labels to refine decision-making. We tested the framework on eight standard biomedical text datasets and observed notable gains in macro F1-score, Hamming loss, and label coverage compared with strong baselines. Empirical results validate that MITHEM outperforms other competitive methods on a variety of biomedical datasets, especially in imbalanced scenarios, in terms of enhanced Recall and Precision scores.

Unpaywall

URI

http://repository.iitgn.ac.in/handle/IITG2025/33335

Keywords

Classification algorithm | Ensemble model | Imbalanced text | Machine learning | Multi-label learning