O6-carboxymethyl guanine(O6-CMG) is a highly mutagenic alkylation product of DNA that causes gastrointestinal cancer in organisms. Existing studies used mutant Mycobacterium smegmatis porin A (MspA) nanopore assisted by Phi29 DNA polymerase to localize it. Recently, machine learning technology has been widely used in the analysis of nanopore sequencing data. But the machine learning always need a large number of data labels that have brought extra work burden to researchers, which greatly affects its practicability. Accordingly, this paper proposes a nano-Unsupervised-Deep-Learning method (nano-UDL) based on an unsupervised clustering algorithm to identify methylation events in nanopore data automatically. Specially, nano-UDL first uses the deep AutoEncoder to extract features from the nanopore dataset and then applies the MeanShift clustering algorithm to classify data. Besides, nano-UDL can extract the optimal features for clustering by joint optimizing the clustering loss and reconstruction loss. Experimental results demonstrate that nano-UDL has relatively accurate recognition accuracy on the O6-CMG dataset and can accurately identify all sequence segments containing O6-CMG. In order to further verify the robustness of nano-UDL, hyperparameter sensitivity verification and ablation experiments were carried out in this paper. Using machine learning to analyze nanopore data can effectively reduce the additional cost of manual data analysis, which is significant for many biological studies, including genome sequencing.
Objective To evaluate the basic performance and clinical application value of nanopore sequencing, in order to provide new ideas for the rapid detection of clinical etiology. Methods From December 2021 to May 2022, blood samples from inpatients suspected of bloodstream infection in Renmin Hospital of Wuhan University were collected, and the nanopore sequencing platform and blood culture method were used to simultaneously identify the pathogenic bacteria in the blood samples of the selected patients, and identify the pathogenic bacteria in the blood samples of the selected patients. The basic performance and clinical utility of nanopore sequencing were evaluated. Results A total of 251 patients were included, and 119 patients (47.4%) were found to have pathogens by nanopore sequencing, which was higher than that of 23 patients (9.2%) by blood culture (χ2=79.167, P<0.001). The results of the two methods are not consistent (kappa=0.052, P=0.175). Nanopore sequencing has a certain missed detection rate. In terms of the types of pathogenic bacteria detected, 47 bacteria and 15 fungi were detected by nanopore sequencing. Conclusion Compared with blood culture, nanopore sequencing has a higher detection rate and more types of pathogens. This technology has obvious advantages in the rapid diagnosis of bloodstream infection pathogens.