Explainable Clinical Risk Prediction from EHR Tabular Data using Monotonic Constraints and Calibrated Probabilities

Danang Danang; Toni Wijanarko Adi Putra

doi:10.55606/jurrsendem.v2i1.9197

Authors

Danang Danang Universitas Sains dan Teknologi Komputer
Toni Wijanarko Adi Putra Universitas Sains dan Teknologi Komputer

DOI:

https://doi.org/10.55606/jurrsendem.v2i1.9197

Keywords:

Clinical Risk Prediction, Monotonic Constraints, Probability Calibration, SHAP, XGBoost

Abstract

Tabular-based clinical risk prediction models are extensively applied in medical decision support systems; however, two major challenges often reduce their reliability: predictions that contradict basic clinical logic and poorly calibrated probability outputs that weaken threshold-based decision making. This study investigates explainable binary risk prediction using the processed Cleveland subset of the UCI Heart Disease dataset as a public clinical benchmark. A lightweight and CPU-efficient pipeline is proposed by employing an XGBoost classifier integrated with monotonic constraints on clinically relevant features, followed by probability calibration through post-hoc methods, including Platt scaling, temperature scaling, and isotonic regression on a separate validation set. Model performance is assessed in terms of discrimination capability using AUROC, AUPRC, F1-score, sensitivity, and specificity, while probability reliability is evaluated using ECE and Brier score metrics. A monotonicity audit is also conducted through counterfactual feature sweeps to measure violation rates. In addition, the model is applied for risk stratification into low-, medium-, and high-risk categories with corresponding event-rate reporting. The findings demonstrate that isotonic regression improves probability reliability without degrading discrimination performance. Furthermore, the monotonicity audit reveals no observed violations for constrained features. Overall, the integration of monotonic constraints and probability calibration produces more decision-ready risk estimates for threshold-based clinical decision support while maintaining transparency through SHAP-based analysis.

Downloads

Download data is not yet available.

References

Arrieta, A. B., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., Garcia, S., Gil-Lopez, S., Molina, D., Benjamins, R., et al. (2020). Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82–115. https://doi.org/10.1016/j.inffus.2019.12.012

Brier, G. W. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78(1), 1–3. https://doi.org/10.1175/1520-0493(1950)078

Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). https://doi.org/10.1145/2939672.2939785

Dua, D., & Graff, C. (2019). UCI machine learning repository.

Ghassemi, M., Oakden-Rayner, L., & Beam, A. L. (2021). The false hope of current approaches to explainable artificial intelligence in health care. The Lancet Digital Health, 3(11), e745–e750. https://doi.org/10.1016/S2589-7500(21)00208-9

Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017). On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning (Vol. 70, pp. 1321–1330). https://proceedings.mlr.press/v70/guo17a.html

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.-Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems. https://arxiv.org/abs/1712.01034

Kull, M., Perello-Nieto, M., Kängsepp, M., Filho, T. S., Song, H., & Flach, P. (2019). Beyond temperature scaling: Obtaining well-calibrated multi-class probabilities with Dirichlet calibration. In Advances in Neural Information Processing Systems. https://arxiv.org/abs/1910.12656

Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems. https://arxiv.org/abs/1705.07874

Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., Katz, R., Himmelfarb, J., Bansal, N., & Lee, S.-I. (2020). From local explanations to global understanding with explainable AI for trees. Nature Machine Intelligence, 2(1), 56–67. https://doi.org/10.1038/s42256-019-0138-9

Moons, K. G. M., Wolff, R. F., Riley, R. D., Whiting, P. F., Westwood, M., Collins, G. S., Reitsma, J. B., Kleijnen, J., & Mallett, S. (2019). PROBAST: A tool to assess risk of bias and applicability of prediction model studies: Explanation and elaboration. Annals of Internal Medicine, 170(1), W1–W33. https://doi.org/10.7326/M18-1377

Niculescu-Mizil, A., & Caruana, R. (2006). Knowledge discovery in the Cleveland heart disease data. In Proceedings of the AAAI Workshop on Evaluation Methods for Machine Learning.

Nixon, J., Dusenberry, M. W., Zhang, L., Jerfel, G., & Tran, D. (2019). Measuring calibration in deep learning. arXiv Preprint. https://arxiv.org/abs/1904.01685

Pei, S., Xue, B., Liu, H., & Wang, X. (2016). Multivariate decision trees with monotonicity constraints. Information Sciences, 369, 178–198. https://doi.org/10.1016/j.ins.2016.06.019

Platt, J. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances in Large Margin Classifiers (pp. 61–74). https://doi.org/10.7551/mitpress/1113.003.0008

Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2018). CatBoost: Unbiased boosting with categorical features. In Advances in Neural Information Processing Systems. https://arxiv.org/abs/1706.09516

Rajkomar, A., Oren, E., Chen, K., Dai, A. M., Hajaj, N., Hardt, M., Liu, P. J., Liu, X., Marcus, J., Sun, M., et al. (2018). Scalable and accurate deep learning with electronic health records. npj Digital Medicine, 1, 18. https://doi.org/10.1038/s41746-018-0029-1

Rudin, C. (2019). Stop explaining black box machine learning models for high-stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1, 206–215. https://doi.org/10.1038/s42256-019-0048-x

Silva Filho, T. M., Song, H., Kull, M., & Flach, P. (2023). Classifier calibration: A review of probabilistic outputs in classification models. Machine Learning. https://doi.org/10.1007/s10994-023-06336-7

Stevenson, M. D., et al. (2021). EHR-based clinical prediction models: A systematic review of risks of bias and reporting. Journal of the American Medical Informatics Association, 28(8), 1759–1771.

Van Calster, B., McLernon, D. J., van Smeden, M., Wynants, L., & Steyerberg, E. W. (2019). Calibration: The Achilles heel of predictive analytics. BMC Medicine, 17(1), 230. https://doi.org/10.1186/s12916-019-1466-7

Vickers, A. J., & Elkin, E. B. (2006). Decision curve analysis: A novel method for evaluating prediction models. Medical Decision Making, 26(6), 565–574. https://doi.org/10.1177/0272989X06295361

Vickers, A. J., Van Calster, B., & Steyerberg, E. W. (2019). Decision curve analysis for evaluating prediction models: A tutorial. Medical Decision Making, 39(5), 583–594. https://doi.org/10.1177/0272989X19855449

Wang, Y., et al. (2022). Monotonic gradient boosting for risk prediction with domain constraints. IEEE Journal of Biomedical and Health Informatics, 26(8), 3890–3901.

Yang, W., et al. (2022). Modified Brier score for evaluating prediction accuracy in binary outcomes. Statistics in Medicine. https://pmc.ncbi.nlm.nih.gov/articles/PMC9691523/

Explainable Clinical Risk Prediction from EHR Tabular Data using Monotonic Constraints and Calibrated Probabilities

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

menu