Recently, deep learning has achieved impressive results in medical image tasks. However, this method usually requires large-scale annotated data, and medical images are expensive to annotate, so it is a challenge to learn efficiently from the limited annotated data. Currently, the two commonly used methods are transfer learning and self-supervised learning. However, these two methods have been little studied in multimodal medical images, so this study proposes a contrastive learning method for multimodal medical images. The method takes images of different modalities of the same patient as positive samples, which effectively increases the number of positive samples in the training process and helps the model to fully learn the similarities and differences of lesions on images of different modalities, thus improving the model's understanding of medical images and diagnostic accuracy. The commonly used data augmentation methods are not suitable for multimodal images, so this paper proposes a domain adaptive denormalization method to transform the source domain images with the help of statistical information of the target domain. In this study, the method is validated with two different multimodal medical image classification tasks: in the microvascular infiltration recognition task, the method achieves an accuracy of (74.79 ± 0.74)% and an F1 score of (78.37 ± 1.94)%, which are improved as compared with other conventional learning methods; for the brain tumor pathology grading task, the method also achieves significant improvements. The results show that the method achieves good results on multimodal medical images and can provide a reference solution for pre-training multimodal medical images.
Computed tomography (CT) imaging is a vital tool for the diagnosis and assessment of lung adenocarcinoma, and using CT images to predict the recurrence-free survival (RFS) of lung adenocarcinoma patients post-surgery is of paramount importance in tailoring postoperative treatment plans. Addressing the challenging task of accurate RFS prediction using CT images, this paper introduces an innovative approach based on self-supervised pre-training and multi-task learning. We employed a self-supervised learning strategy known as “image transformation to image restoration” to pretrain a 3D-UNet network on publicly available lung CT datasets to extract generic visual features from lung images. Subsequently, we enhanced the network’s feature extraction capability through multi-task learning involving segmentation and classification tasks, guiding the network to extract image features relevant to RFS. Additionally, we designed a multi-scale feature aggregation module to comprehensively amalgamate multi-scale image features, and ultimately predicted the RFS risk score for lung adenocarcinoma with the aid of a feed-forward neural network. The predictive performance of the proposed method was assessed by ten-fold cross-validation. The results showed that the consistency index (C-index) of the proposed method for predicting RFS and the area under curve (AUC) for predicting whether recurrence occurs within three years reached 0.691 ± 0.076 and 0.707 ± 0.082, respectively, and the predictive performance was superior to that of existing methods. This study confirms that the proposed method has the potential of RFS prediction in lung adenocarcinoma patients, which is expected to provide a reliable basis for the development of individualized treatment plans.