Cancer gene expression data have the characteristics of high dimensionalities and small samples so it is necessary to perform dimensionality reduction of the data. Traditional linear dimensionality reduction approaches can not find the nonlinear relationship between the data points. In addition, they have bad dimensionality reduction results. Therefore a multiple weights locally linear embedding (LLE) algorithm with improved distance is introduced to perform dimensionality reduction in this study. We adopted an improved distance to calculate the neighbor of each data point in this algorithm, and then we introduced multiple sets of linearly independent local weight vectors for each neighbor, and obtained the embedding results in the low-dimensional space of the high-dimensional data by minimizing the reconstruction error. Experimental result showed that the multiple weights LLE algorithm with improved distance had good dimensionality reduction functions of the cancer gene expression data.
Due to the minimum free energy model, it is very important to predict the RNA secondary structure accurately and efficiently from the suboptimal foldings. Using clustering techniques in analyzing the suboptimal structures could effectively improve the prediction accuracy. An improved k-medoids cluster method is proposed to make this a better accuracy with the RBP score and the incremental candidate set of medoids matrix in this paper. The algorithm optimizes initial medoids through an expanding medoids candidate sets gradually.The predicted results indicated this algorithm could get a higher value of CH and significantly shorten the time for calculating clustering RNA folding structures.