ObjectiveTo evaluate the use of machine learning algorithms for the prediction and characterization of cardiac thrombosis in patients with valvular heart disease and atrial fibrillation. MethodsThis article collected data of patients with valvular disease and atrial fibrillation from West China Hospital of Sichuan University and its branches from 2016 to 2021. From a total of 2 515 patients who underwent valve surgery, 886 patients with valvular disease and atrial fibrillation were included in the study, including 545 (61.5%) males and 341 (38.5%) females, with a mean age of 55.62±9.26 years, and 192 patients had intraoperatively confirmed cardiac thrombosis. We used five supervised machine learning algorithms to predict thrombosis in patients. Based on the clinical data of the patients (33 features after feature screening), the 10-fold nested cross-validation method was used to evaluate the predictive effect of the model through evaluation indicators such as area under the curve, F1 score and Matthews correlation coefficient. Finally, the SHAP interpretation method was used to interpret the model, and the characteristics of the model were analyzed using a patient as an example. ResultsThe final experiment showed that the random forest classifier had the best comprehensive evaluation indicators, the area under the receiver operating characteristic curve was 0.748±0.043, and the accuracy rate reached 79.2%. Interpretation and analysis of the model showed that factors such as stroke volume, peak mitral E-wave velocity and tricuspid pressure gradient were important factors influencing the prediction. ConclusionThe random forest model achieves the best predictive performance and is expected to be used by clinicians as an aided decision-making tool for screening high-embolic risk patients with valvular atrial fibrillation.
Objective To develop a machine learning (ML) model to predict the risk of death in intensive care unit (ICU) patients with chronic obstructive pulmonary disease (COPD), explain the factors related to the risk of death in COPD patients, and solve the "black box" problem of ML model. Methods A total of 8088 patients with severe COPD were selected from the eICU Collaborative Research Database (eICU-CRD). Data within the initial 24 hours of each ICU stay were extracted and randomly divided, with 70% for model training and 30% for model validation. The LASSO regression was deployed for predictor variable selection to avoid overfitting. Five ML models were employed to predict in-hospital mortality. The prediction performance of the ML models was compared with alternative models using the area under curve (AUC), while SHAP (SHapley Additive exPlanations) method was used to explain this random forest (RF) model. Results The RF model performed best among the APACHE IVa scoring system and five ML models with the AUC of 0.830 (95%CI 0.806 - 0.855). The SHAP method detects the top 10 predictors according to the importance ranking and the minimum of non-invasive systolic blood pressure was recognized as the most significant predictor variable. Conclusion Leveraging ML model to capture risk factors and using the SHAP method to interpret the prediction outcome can predict the risk of death of patients early, which helps clinicians make accurate treatment plans and allocate medical resources rationally.
Objective To develop and compare the predictive performance of five machine learning models for adverse postoperative outcomes in cardiac surgery patients, and to identify key decision factors through SHAP interpretability analysis.Methods A retrospective collection of perioperative data (including demographic information, preoperative, intraoperative, and postoperative indicators) with 88 variables was conducted from adult cardiac surgery patients at the First Affiliated Hospital of Xinjiang Medical University in 2023. Adverse postoperative outcomes were defined as the occurrence of acute kidney injury and/or in-hospital mortality during the postoperative hospitalization period following cardiac surgery. Patients were divided into an adverse outcome group and a favorable outcome group based on the presence of adverse postoperative outcomes. After screening feature variables using the Least Absolute Shrinkage and Selection Operator (LASSO) regression method, five machine learning models were constructed: Extreme Gradient Boosting (XGBoost), Random Forest (RF), Gradient Boosting Machine (GBM), Light Gradient Boosting Machine (LightGBM), and Generalized Linear Model (GLM). The dataset was split into a training set (n=447) and a test set (n=192) using stratified sampling. Model performance was evaluated using Receiver Operating Characteristic (ROC) curves, Decision Curve Analysis (DCA), and F1 Score. The SHAP method was applied to analyze feature contribution.Results: A total of 639 patients were included, comprising 395 males and 244 females, with a median age of 62 (55, 69) years. The adverse outcome group consisted of 191 patients, while the favorable outcome group included 448 patients, resulting in an adverse postoperative outcome incidence of 29.9%. Univariate analysis showed no significant differences between the two groups for any variables (P>0.05). Using LASSO regression, 16 feature variables were selected (including cardiopulmonary bypass support time, blood glucose on postoperative day 3, creatine kinase-MB isoenzyme, systemic inflammatory response index, etc.), and five machine learning models (GLM, RF, GBM, LightGBM, XGBoost) were constructed. Evaluation results demonstrated that the XGBoost model exhibited the best predictive performance on both the training and test sets, with AUC values of 0.761 [95%CI (0.719, 0.800) ] and 0.759 [95%CI (0.692, 0.818) ], respectively. It also significantly outperformed other models in precision, positive predictive value (PPV), and balanced accuracy. Decision curve analysis further confirmed its clinical utility across various risk thresholds. SHAP analysis indicated that variables such as cardiopulmonary bypass support time, blood glucose on postoperative day 3, creatine kinase-MB isoenzyme, and inflammatory markers (SIRI, NLR, CAR) had high contributions to the prediction.Conclusion: The XGBoost model effectively predicts adverse postoperative outcomes in cardiac surgery patients. Clinically, attention should be focused on cardiopulmonary bypass support time, postoperative blood glucose control, and monitoring of inflammatory levels to improve patient prognosis.
Objective To construct the prediction model of hospitalization expenses for ischemic heart disease, reveal the key factors affecting hospitalization expenses, and analyze the interaction between variables. Methods Patients from Sichuan medical insurance comprehensive service platform from January 2020 to December 2021 were extracted. The training set and test set were divided according to the ratio of 7∶3. Six machine learning models were constructed and trained by ten-fold cross validation, and was explained by SHAP theory. Results XGBoost model had the best performance among these models, with a R2 of 0.60, RMSE of 9 969.71 yuan, and MAE of 5 242.90 yuan in the test set. SHAP results showed that the five variables with the greatest impact on hospitalization expenses were surgery, length of stay, hospital grade, disease type and DRG. Hospitalization costs were higher when grade 3 or 4 procedures were performed, the length of stay was prolonged, the hospitalization was in a tertiary hospital, and payments were made for acute myocardial infarction and non-DRG. With the prolongation of hospital stay, the hospitalization expenses increased faster when the patient had grade 4 surgery and was in a tertiary hospital. In addition, DRG payment will reduce the length of hospital stay and the hospitalization expenses of patients with different disease types. Conclusion The interpretable XGBoost model constructed in this study has a good predictive performance for the hospitalization expenses of patients with ischemic heart disease. Combined with SHAP theory, it can effectively identify the key factors affecting the hospitalization expenses and analyze their interactions.