The PET/CT imaging technology combining positron emission tomography (PET) and computed tomography (CT) is the most advanced imaging examination method currently, and is mainly used for tumor screening, differential diagnosis of benign and malignant tumors, staging and grading. This paper proposes a method for breast cancer lesion segmentation based on PET/CT bimodal images, and designs a dual-path U-Net framework, which mainly includes three modules: encoder module, feature fusion module and decoder module. Among them, the encoder module uses traditional convolution for feature extraction of single mode image; The feature fusion module adopts collaborative learning feature fusion technology and uses Transformer to extract the global features of the fusion image; The decoder module mainly uses multi-layer perceptron to achieve lesion segmentation. This experiment uses actual clinical PET/CT data to evaluate the effectiveness of the algorithm. The experimental results show that the accuracy, recall and accuracy of breast cancer lesion segmentation are 95.67%, 97.58% and 96.16%, respectively, which are better than the baseline algorithm. Therefore, it proves the rationality of the single and bimodal feature extraction method combining convolution and Transformer in the experimental design of this article, and provides reference for feature extraction methods for tasks such as multimodal medical image segmentation or classification.