Speech expression is an important high-level cognitive behavior of human beings. The realization of this behavior is closely related to human brain activity. Both true speech expression and speech imagination can activate part of the same brain area. Therefore, speech imagery becomes a new paradigm of brain-computer interaction. Brain-computer interface (BCI) based on speech imagery has the advantages of spontaneous generation, no training, and friendliness to subjects, so it has attracted the attention of many scholars. However, this interactive technology is not mature in the design of experimental paradigms and the choice of imagination materials, and there are many issues that need to be discussed urgently. Therefore, in response to these problems, this article first expounds the neural mechanism of speech imagery. Then, by reviewing the previous BCI research of speech imagery, the mainstream methods and core technologies of experimental paradigm, imagination materials, data processing and so on are systematically analyzed. Finally, the key problems and main challenges that restrict the development of this type of BCI are discussed. And the future development and application perspective of the speech imaginary BCI system are prospected.
Speech imagery is an emerging brain-computer interface (BCI) paradigm with potential to provide effective communication for individuals with speech impairments. This study designed a Chinese speech imagery paradigm using three clinically relevant words—“Help me”, “Sit up” and “Turn over”—and collected electroencephalography (EEG) data from 15 healthy subjects. Based on the data, a Channel Attention Multi-Scale Convolutional Neural Network (CAM-Net) decoding algorithm was proposed, which combined multi-scale temporal convolutions with asymmetric spatial convolutions to extract multidimensional EEG features, and incorporated a channel attention mechanism along with a bidirectional long short-term memory network to perform channel weighting and capture temporal dependencies. Experimental results showed that CAM-Net achieved a classification accuracy of 48.54% in the three-class task, outperforming baseline models such as EEGNet and Deep ConvNet, and reached a highest accuracy of 64.17% in the binary classification between “Sit up” and “Turn over.” This work provides a promising approach for future Chinese speech imagery BCI research and applications.