west china medical publishers
Author
  • Title
  • Author
  • Keyword
  • Abstract
Advance search
Advance search

Search

find Author "ZHENG chuan" 1 results
  • Construction and evaluation of a 'disease-syndrome combination' prediction model for pulmonary nodules based on oral microbiomics

    Objective To construct a "disease-syndrome combination" mathematical representation model for pulmonary nodules based on oral microbiome data, utilizing a multimodal data algorithm framework centered on dynamic systems theory. Furthermore, to compare predictive models under various algorithmic frameworks and validate the efficacy of the optimal model in predicting the presence of pulmonary nodules. MethodsA novel multimodal data algorithm framework centered on dynamic systems theory, termed VAEGANTF (Variational Auto Encoder-Generative Adversarial Network-Transformer), was proposed. Subsequently, based on a multi-dimensional integrated dataset of "clinical features-syndrome elements-microorganisms", all subjects were divided into training (70%) and testing (30%) sets for model construction and efficacy testing, respectively. Using healthy individuals and patients with pulmonary nodules as dependent variables, and combining candidate markers such as clinical features, lesion location, disease nature, and microbial genera, the independent variables were screened based on variable importance ranking after identifying and addressing multicollinearity. Missing values were then imputed, and data were standardized. Eight machine learning algorithms were then employed to construct pulmonary nodule risk prediction models: random forest, least absolute shrinkage and selection operator (LASSO) regression, support vector machine, multilayer perceptron, eXtreme gradient boosting (XGBoost), VAE-ViT (Vision Transformer), GAN-ViT, and VAEGANTF. K-fold cross-validation was used for model parameter tuning and optimization. The efficacy of the eight predictive models was evaluated using confusion matrices and receiver operating characteristic (ROC) curves, and the optimal model was selected. Finally, goodness-of-fit testing and decision curve analysis (DCA) were performed to evaluate the optimal model. ResultsThere were no statistically significant differences between the two groups in demographic characteristics such as age and sex. The 312 subjects were randomly divided into training and testing sets (7∶3), and prediction models were constructed using the eight machine learning algorithms. After excluding potential problems such as multicollinearity, a total of 301 clinical feature information, syndrome elements, and microbial genera markers were included for model construction. The area under the curve (AUC) values of the random forest, LASSO regression, support vector machine, multilayer perceptron, and VAE-ViT models did not reach 0.85, indicating poor efficacy. The AUC values of the XGBoost, GAN-ViT, and VAEGANTF models all reached above 0.85, with the VAEGANTF model exhibiting the highest AUC value (AUC=0.923). Goodness-of-fit testing indicated good calibration ability of the VAEGANTF model, and decision curve analysis showed a high degree of clinical benefit. The nomogram results showed that age, sex, heart, lung, Qi deficiency, blood stasis, dampness, Porphyromonas genus, Granulicatella genus, Neisseria genus, Haemophilus genus, and Actinobacillus genus could be used as predictors. Conclusion The "disease-syndrome combination" risk prediction model for pulmonary nodules based on the VAEGANTF algorithm framework, which incorporates multi-dimensional data features of "clinical features-syndrome elements-microorganisms", demonstrates better performance compared to other machine learning algorithms and has certain reference value for early non-invasive diagnosis of pulmonary nodules.

    Release date: Export PDF Favorites Scan
1 pages Previous 1 Next

Format

Content