Protein lysine β-hydroxybutyrylation (Kbhb) is a newly discovered post-translational modification associated with a wide range of biological processes. Identifying Kbhb sites is critical to better understanding its mechanism of action. However, biochemical experimental methods for probing Kbhb sites are costly and have a long cycle. Therefore, a feature embedding learning method based on the Transformer encoder was proposed to predict Kbhb sites. In this method, amino acid residues were mapped into numerical vectors according to their amino acid class and position in a learnable feature embedding method, and then the Transformer encoder was used to extract discriminating features, and the bidirectional long short-term memory network (BiLSTM) was used to capture the correlation between different features. In this paper, a benchmark dataset was constructed, and a Kbhb site predictor, AutoTF-Kbhb, was implemented based on the proposed method. Experimental results showed that the proposed feature embedding learning method could extract effective features. AutoTF-Kbhb achieved an area under curve (AUC) of 0.87 and a Matthews correlation coefficient (MCC) of 0.37 on the independent test set, significantly outperforming other methods in comparison. Therefore, AutoTF-Kbhb can be used as an auxiliary means to identify Kbhb sites.
Sleep staging is the basis for solving sleep problems. There’s an upper limit for the classification accuracy of sleep staging models based on single-channel electroencephalogram (EEG) data and features. To address this problem, this paper proposed an automatic sleep staging model that mixes deep convolutional neural network (DCNN) and bi-directional long short-term memory network (BiLSTM). The model used DCNN to automatically learn the time-frequency domain features of EEG signals, and used BiLSTM to extract the temporal features between the data, fully exploiting the feature information contained in the data to improve the accuracy of automatic sleep staging. At the same time, noise reduction techniques and adaptive synthetic sampling were used to reduce the impact of signal noise and unbalanced data sets on model performance. In this paper, experiments were conducted using the Sleep-European Data Format Database Expanded and the Shanghai Mental Health Center Sleep Database, and achieved an overall accuracy rate of 86.9% and 88.9% respectively. When compared with the basic network model, all the experimental results outperformed the basic network, further demonstrating the validity of this paper's model, which can provide a reference for the construction of a home sleep monitoring system based on single-channel EEG signals.