• 1. School of Biomedical Engineering, South-Central Minzu University, Wuhan 430074, P. R. China;
ZHANG Li, Email: zhangli1996@163.com
Export PDF Favorites Scan Get Citation

For the increasing number of patients with depression, this paper proposes an artificial intelligence method to effectively identify depression through voice signals, with the aim of improving the efficiency of diagnosis and treatment. Firstly, a pre-training model called wav2vec 2.0 is fine-tuned to encode and contextualize the speech, thereby obtaining high-quality voice features. This model is applied to the publicly available dataset - the distress analysis interview corpus-wizard of OZ (DAIC-WOZ). The results demonstrate a precision rate of 93.96%, a recall rate of 94.87%, and an F1 score of 94.41% for the binary classification task of depression recognition, resulting in an overall classification accuracy of 96.48%. For the four-class classification task evaluating the severity of depression, the precision rates are all above 92.59%, the recall rates are all above 92.89%, the F1 scores are all above 93.12%, and the overall classification accuracy is 94.80%. The research findings indicate that the proposed method effectively enhances classification accuracy in scenarios with limited data, exhibiting strong performance in depression identification and severity evaluation. In the future, this method has the potential to serve as a valuable supportive tool for depression diagnosis.

Citation: HUANG Xiangsheng, LIAO Yilong, ZHANG Wenjing, ZHANG Li. A research on depression recognition based on voice pre-training model. Journal of Biomedical Engineering, 2024, 41(1): 9-16. doi: 10.7507/1001-5515.202304008 Copy

  • Previous Article

    Spatial navigation method based on the entorhinal-hippocampal-prefrontal information transmission circuit of rat’s brain
  • Next Article

    Research on mode adjustment control strategy of upper limb rehabilitation robot based on fuzzy recognition of interaction force