Repository logo
  • English
  • العربية
  • বাংলা
  • Català
  • Čeština
  • Deutsch
  • Ελληνικά
  • Español
  • Suomi
  • Français
  • Gàidhlig
  • हिंदी
  • Magyar
  • Italiano
  • Қазақ
  • Latviešu
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Srpski (lat)
  • Српски
  • Svenska
  • Türkçe
  • Yкраї́нська
  • Tiếng Việt
Log In
New user? Click here to register.Have you forgotten your password?
  1. Home
  2. IIT Gandhinagar
  3. Computer Science and Engineering
  4. CSE Publications
  5. Multi-label imbalanced text handling using ensemble methodology with application to biomedical data classification
 
  • Details

Multi-label imbalanced text handling using ensemble methodology with application to biomedical data classification

Source
Iran Journal of Computer Science
ISSN
25208438
Date Issued
2025-01-01
Author(s)
Ghosh, Subhajit
Gupta, Sanidhya
Bhattacharyya, Sourav
Das, Avik Kumar
Nandi, Apurba
Sarkar, Ardhendu
Samanta, Partha Sarathi
Polprasert, Chantri
DOI
10.1007/s42044-025-00332-x
Abstract
The surge in biomedical literature and clinical reports presents a formidable challenge for automated text analysis, particularly in multi-label classification tasks where severe class imbalance and interdependent labels are common. To address these issues, we propose MITHEM (Multi-label Imbalance-aware Text Classification using Hybrid Ensemble Model), an ensemble framework that combines threshold-guided binning, SMOTE based oversampling, and a set of diverse classifiers Support Vector Machines, Decision Trees, and Random Forests within a meta-classification approach. Unlike traditional techniques, MITHEM not only improves the representation of minority classes but also learns correlations between labels to refine decision-making. We tested the framework on eight standard biomedical text datasets and observed notable gains in macro F1-score, Hamming loss, and label coverage compared with strong baselines. Empirical results validate that MITHEM outperforms other competitive methods on a variety of biomedical datasets, especially in imbalanced scenarios, in terms of enhanced Recall and Precision scores.
Unpaywall
URI
http://repository.iitgn.ac.in/handle/IITG2025/33335
Keywords
Classification algorithm | Ensemble model | Imbalanced text | Machine learning | Multi-label learning
IITGN Knowledge Repository Developed and Managed by Library

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Privacy policy
  • End User Agreement
  • Send Feedback
Repository logo COAR Notify