The process of multi-parametric flow cytometry data analysis is complicate and time-consuming, which requires well-trained professionals to operate on. To overcome this limitation, a method for multi-parameter flow cytometry data processing based on kernel principal component analysis (KPCA) was proposed in this paper. The dimensionality of the data was reduced by nonlinear transform. After the new characteristic variables were obtained, automatical clustering can be achieved using improvedK-means algorithm. Experimental data of peripheral blood lymphocyte were processed using the principal component analysis (PCA)-based method and KPCA-based method and then the influence of different feature parameter selections was explored. The results indicate that the KPCA can be successfully applied in the multi-parameter flow cytometry data analysis for efficient and accurate cell clustering, which can improve the efficiency of flow cytometry in clinical diagnosis analysis.
The traditional method of multi-parameter flow data clustering in flow cytometry is to mainly use professional software to manually set the door and circle out the target cells for analysis. The analysis process is complex and professional. Based on this, a clustering algorithm, which is based on t-distributed stochastic neighbor embedding (t-SNE) algorithm for multi-parameter stream data, is proposed in the paper. In this algorithm, the Euclidean distance of sample data in high dimensional space is transformed into conditional probability to represent similarity, and the data is reduced to low dimensional space. In this paper, the stained human peripheral blood cells were treated by flow cytometry, and the processed data were derived as experimental sample data. Thet-SNE algorithm is compared with the kernel principal component analysis (KPCA) dimensionality reduction algorithm, and the main component data obtained by the dimensionality reduction are classified using K-means algorithm. The results show that thet-SNE algorithm has a good clustering effect on the cell population with asymmetric and trailing distribution, and the clustering accuracy can reach 92.55%, which may be helpful for automatic analysis of multi-color multi-parameter flow data.