Ghosh, SubhajitSubhajitGhoshGupta, SanidhyaSanidhyaGuptaBhattacharyya, SouravSouravBhattacharyyaDas, Avik KumarAvik KumarDasNandi, ApurbaApurbaNandiSarkar, ArdhenduArdhenduSarkarSamanta, Partha SarathiPartha SarathiSamantaPolprasert, ChantriChantriPolprasert2025-10-212025-10-212025-01-0110.1007/s42044-025-00332-x2-s2.0-105017904225http://repository.iitgn.ac.in/handle/IITG2025/33335The surge in biomedical literature and clinical reports presents a formidable challenge for automated text analysis, particularly in multi-label classification tasks where severe class imbalance and interdependent labels are common. To address these issues, we propose MITHEM (Multi-label Imbalance-aware Text Classification using Hybrid Ensemble Model), an ensemble framework that combines threshold-guided binning, SMOTE based oversampling, and a set of diverse classifiers Support Vector Machines, Decision Trees, and Random Forests within a meta-classification approach. Unlike traditional techniques, MITHEM not only improves the representation of minority classes but also learns correlations between labels to refine decision-making. We tested the framework on eight standard biomedical text datasets and observed notable gains in macro F1-score, Hamming loss, and label coverage compared with strong baselines. Empirical results validate that MITHEM outperforms other competitive methods on a variety of biomedical datasets, especially in imbalanced scenarios, in terms of enhanced Recall and Precision scores.falseClassification algorithm | Ensemble model | Imbalanced text | Machine learning | Multi-label learningMulti-label imbalanced text handling using ensemble methodology with application to biomedical data classificationArticle2520844620250arArticle