Remarkable results have been realized by the U-Net network in the task of medical image segmentation. In recent years, many scholars have been researching the network and expanding its structure, such as improvement of encoder and decoder and improvement of skip connection. Based on the optimization of U-Net structure and its medical image segmentation techniques, this paper elucidates in the following: First, the paper elaborates on the application of U-Net in the field of medical image segmentation; Then, the paper summarizes the seven improvement mechanism of U-Net: dense connection mechanism, residual connection mechanism, multi-scale mechanism, ensemble mechanism, dilated mechanism, attention mechanism, and transformer mechanism; Finally, the paper states the ideas and methods on the U-Net structure improvement in a bid to provide a reference for later researches, which plays a significant part in advancing U-Net.
The skin is the largest organ of the human body, and many visceral diseases will be directly reflected on the skin, so it is of great clinical significance to accurately segment the skin lesion images. To address the characteristics of complex color, blurred boundaries, and uneven scale information, a skin lesion image segmentation method based on dense atrous spatial pyramid pooling (DenseASPP) and attention mechanism is proposed. The method is based on the U-shaped network (U-Net). Firstly, a new encoder is redesigned to replace the ordinary convolutional stacking with a large number of residual connections, which can effectively retain key features even after expanding the network depth. Secondly, channel attention is fused with spatial attention, and residual connections are added so that the network can adaptively learn channel and spatial features of images. Finally, the DenseASPP module is introduced and redesigned to expand the perceptual field size and obtain multi-scale feature information. The algorithm proposed in this paper has obtained satisfactory results in the official public dataset of the International Skin Imaging Collaboration (ISIC 2016). The mean Intersection over Union (mIOU), sensitivity (SE), precision (PC), accuracy (ACC), and Dice coefficient (Dice) are 0.901 8, 0.945 9, 0.948 7, 0.968 1, 0.947 3, respectively. The experimental results demonstrate that the method in this paper can improve the segmentation effect of skin lesion images, and is expected to provide an auxiliary diagnosis for professional dermatologists.
The PET/CT imaging technology combining positron emission tomography (PET) and computed tomography (CT) is the most advanced imaging examination method currently, and is mainly used for tumor screening, differential diagnosis of benign and malignant tumors, staging and grading. This paper proposes a method for breast cancer lesion segmentation based on PET/CT bimodal images, and designs a dual-path U-Net framework, which mainly includes three modules: encoder module, feature fusion module and decoder module. Among them, the encoder module uses traditional convolution for feature extraction of single mode image; The feature fusion module adopts collaborative learning feature fusion technology and uses Transformer to extract the global features of the fusion image; The decoder module mainly uses multi-layer perceptron to achieve lesion segmentation. This experiment uses actual clinical PET/CT data to evaluate the effectiveness of the algorithm. The experimental results show that the accuracy, recall and accuracy of breast cancer lesion segmentation are 95.67%, 97.58% and 96.16%, respectively, which are better than the baseline algorithm. Therefore, it proves the rationality of the single and bimodal feature extraction method combining convolution and Transformer in the experimental design of this article, and provides reference for feature extraction methods for tasks such as multimodal medical image segmentation or classification.
In response to the issues of single-scale information loss and large model parameter size during the sampling process in U-Net and its variants for medical image segmentation, this paper proposes a multi-scale medical image segmentation method based on pixel encoding and spatial attention. Firstly, by redesigning the input strategy of the Transformer structure, a pixel encoding module is introduced to enable the model to extract global semantic information from multi-scale image features, obtaining richer feature information. Additionally, deformable convolutions are incorporated into the Transformer module to accelerate convergence speed and improve module performance. Secondly, a spatial attention module with residual connections is introduced to allow the model to focus on the foreground information of the fused feature maps. Finally, through ablation experiments, the network is lightweighted to enhance segmentation accuracy and accelerate model convergence. The proposed algorithm achieves satisfactory results on the Synapse dataset, an official public dataset for multi-organ segmentation provided by the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), with Dice similarity coefficient (DSC) and 95% Hausdorff distance (HD95) scores of 77.65 and 18.34, respectively. The experimental results demonstrate that the proposed algorithm can enhance multi-organ segmentation performance, potentially filling the gap in multi-scale medical image segmentation algorithms, and providing assistance for professional physicians in diagnosis.