An empirical study of machine learning robustness and scalability for imbalanced tabular clinical data in emergency and critical care
Published in Scientific Reports, 2026
Machine learning can support high-stakes decision-making in emergency and intensive care settings, but severe class imbalance in clinical data limits model reliability and biases predictions toward majority outcomes. We evaluate six model families, including classical methods (Decision Tree, Random Forest, XGBoost), deep learning approaches (TabNet), and tabular foundation models (TabICL, TabPFN v2.6), on MIMIC-IV-ED and eICU datasets across multiple clinical prediction tasks. Models are assessed using Macro F1-score, robustness to increasing imbalance, and computational efficiency. Results show dataset-dependent performance: TabPFN and TabICL perform strongly on MIMIC-IV-ED, while XGBoost leads on eICU. No single model dominates across all settings, but foundation models provide a favorable efficiency–performance trade-off and are increasingly competitive in imbalanced clinical prediction scenarios. Read more
Recommended citation: Brima, Y., Atemkeng, M. An empirical study of machine learning robustness and scalability for imbalanced tabular clinical data in emergency and critical care. Sci Rep 16, 18004 (2026) https://doi.org/10.1038/s41598-026-56413-9
