• 1. Research Center of Clinical Medicine, Affiliated Hospital of Nantong University, NanTong, 226001, Jiangsu, P. R. China;
  • 2. Department of Thoracic Surgery, Peking University Third Hospital, Beijing, 100191, P. R. China;
  • 3. Department of Pathology, Affiliated Hospital of Nantong University, Nantong, 226001, Jiangsu, P. R. China;
  • 4. Department of Cardiothoracic Surgery, Affiliated Hospital of Nantong University, Nantong, 226001, Jiangsu, P. R. China;
LIU Yifei, Email: ntdxliuyifei@sina.com; SHI Jiahai, Email: sjh@ntu.edu.cn
Export PDF Favorites Scan Get Citation

Objective To explore the accuracy of machine learning algorithms based on SHOX2 and RASSF1A methylation levels in predicting early-stage lung adenocarcinoma pathological types. Methods A retrospective analysis was conducted on formalin-fixed paraffin-embedded (FFPE) specimens from patients who underwent lung tumor resection surgery at Nantong University Affiliated Hospital from January 2021 to January 2023. The methylation levels of SHOX2 and RASSF1A in FFPE specimens were measured using the LungMe kit through methylation-specific PCR (MS-PCR). Using the methylation levels of SHOX2 and RASSF1A as predictive variables, various machine learning algorithms (including logistic regression, XGBoost, random forest, and naive Bayes) were employed to predict different lung adenocarcinoma pathological types, and a web server was constructed for clinical use. Results A total of 272 patients were included. Based on the pathological classification of the tumors, patients were divided into three groups: benign tumor/adenocarcinoma in situ (BT/AIS), micro-invasive adenocarcinoma (MIA), and invasive adenocarcinoma (IA). The average ages of patients in the BT/AIS, MIA, and IA groups were 57.97, 61.31, and 63.84 years, respectively; the proportions of female patients were 55.38%, 61.11%, and 61.36%, respectively. In the early-stage lung adenocarcinoma prediction model established based on SHOX2 and RASSF1A methylation levels, the random forest and XGBoost models performed well in predicting each pathological type. The C-statistics of the random forest model for the BT/AIS, MIA, and IA groups were 0.70, 0.71, and 0.78, respectively. The C-statistics of the XGBoost model for the BT/AIS, MIA, and IA groups were 0.70, 0.75, and 0.77, respectively. The naive Bayes model only showed robust performance in the IA group, with a C-statistic of 0.73, indicating some predictive ability. The logistic regression model performed the worst among all groups, showing no predictive ability for any group. Through decision curve analysis, the random forest model demonstrated higher net benefit in predicting BT/AIS and MIA pathological types, indicating its potential value in clinical application. Finally, a website for predicting early-stage lung adenocarcinoma pathological types based on the random forest model was developed. Conclusion Machine learning algorithms based on SHOX2 and RASSF1A methylation levels have high accuracy in predicting early-stage lung adenocarcinoma pathological types. The establishment of the pathological type prediction website makes the clinical application of the model more convenient, enhancing the ability of clinicians in making decisions about lung tumor pathological typing.