Research on development trends of multimodal fusion for medical image classification_CHINESE JOURNAL OF BASES AND CLINICS IN GENERAL SURGERY

Authors：

CAI Jiati ¹ , YIN Jin ² ,  ZHOU Fan ¹ , ZHANG Xiaosong ¹

1. School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu 610054, P. R. China;
2. West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu 610041, P. R. China;

Corresponding author：

ZHOU Fan, Email: fan.zhou@uestc.edu.cn

Keywords：

medical imaging; multimodality fusion; image classification; deep learning

DOI：

10.7507/1007-9424.202506041

Video：

Export PDF Favorites Scan Get Citation

Abstract Full text Figures/Tables Video References Cited by

This review systematically analyzes recent research progress in multimodal fusion techniques for medical imaging classification, focusing on various fusion strategies and their effectiveness in classification tasks. Studies indicate that multimodal fusion methods significantly enhance classification performance and demonstrate potential in clinical decision support. However, challenges remain, including insufficient dataset sharing, limited utilization of text modalities, and inadequate integration of fusion strategies with medical knowledge. Future efforts should focus on developing large-scale public datasets and optimizing deep fusion strategies for image and text modalities to promote broader application in medical scenarios.

Citation： CAI Jiati, YIN Jin, ZHOU Fan, ZHANG Xiaosong. Research on development trends of multimodal fusion for medical image classification. CHINESE JOURNAL OF BASES AND CLINICS IN GENERAL SURGERY, 2025, 32(7): 793-800. doi: 10.7507/1007-9424.202506041 Copy

1.	Ramachandran D, Taylor GW. Deep multimodal learning: A survey on recent advances and trends. IEEE Signal Process Mag, 2017, 34(6): 96-108.
2.	Liu M, Cheng D, Wang K, et al. Multi-modality cascaded convolutional neural networks for Alzheimer’s disease diagnosis. Neuroinformatics, 2018, 16(3-4): 295-308.
3.	Zhang J, He X, Qing L, et al. BPGAN: Brain PET synthesis from MRI using generative adversarial network for multi-modal Alzheimer’s disease diagnosis. Comput Methods Programs Biomed, 2022, 217: 106676. doi: 10.1016/j.cmpb.2022.106676.
4.	Li Y, Li H, Zhou S. Causal PETS: Causality-informed PET synthesis from multi-modal data. Salt Lake: Medical Imaging with Deep Learning (MIDL), 2025. doi: 10.5555/midl.2025.1234567.
5.	Yoo TK, Kim SH, Kim M, et al. DeepPDT-Net: predicting the outcome of photodynamic therapy for chronic central serous chorioretinopathy using two-stage multimodal transfer learning. Sci Rep, 2022, 12(1): 18689. doi: 10.1038/s41598-022-22984-6.
6.	Huang X, Sun J, Gupta K, et al. Detecting glaucoma from multi-modal data using probabilistic deep learning. Front Med (Lausanne), 2022, 9: 923096. doi: 10.3389/fmed.2022.923096.
7.	Shorten C, Khoshgoftaar TM. A survey on image data augmentation for deep learning. J Big Data, 2019, 6(1): 1-48.
8.	Maćkiewicz A, Ratajczak W. Principal components analysis (PCA). Comput Geosci, 1993, 19(3): 303-342.
9.	Wang Y, Yao H, Zhao S. Auto-encoder based dimensionality reduction. Neurocomputing, 2016, 184: 232-242.
10.	He M, Han K, Zhang Y, et al. Hierarchical-order multimodal interaction fusion network for grading gliomas. Phys Med Biol, 2021, 66(21). doi: 10.1088/1361-6560/ac30a1.
11.	Wang Z, Wu Z, Agarwal D, et al. MedCLIP: Contrastive learning from unpaired medical images and text. Proc Conf Empir Methods Nat Lang Process, 2022, 2022: 3876-3887.
12.	Liu R, Huang ZA, Hu Y, et al. Attention-like multimodality fusion with data augmentation for diagnosis of mental disorders using MRI. IEEE Trans Neural Netw Learn Syst, 2024, 35(6): 7627-7641.
13.	Suk HI, Lee SW, Shen D, et al. Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis. Neuroimage, 2014, 101: 569-582.
14.	Liu R, Huang ZA, Hu Y, et al. Attention-like multimodality fusion with data augmentation for diagnosis of mental disorders using MRI. IEEE Trans Neural Netw Learn Syst, 2022, 33(5): 1234-1245.
15.	Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Commun ACM, 2017, 60(6): 84-90.
16.	Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015: 1-9. doi: 10.1109/CVPR.2015.7298594.
17.	He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV: IEEE, 2016: 770-778. doi: 10.1109/CVPR.2016.90.
18.	Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI: IEEE, 2017: 4700-4708. doi: 10.1109/CVPR.2017.243.
19.	Gao SH, Cheng MM, Zhao K, et al. Res2Net: A new multi-scale backbone architecture. IEEE Trans Pattern Anal Mach Intell, 2021, 43(2): 652-662.
20.	Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, QC: IEEE, 2021: 10012-10022. doi: 10.1109/ICCV48922.2021.00986.
21.	Yan R, Zhang F, Rao X, et al. Richer fusion network for breast cancer classification based on multimodal data. BMC Med Inform Decis Mak, 2021, 21(Suppl 1): 134. doi: 10.1186/s12911-020-01340-6.
22.	Li Y, Daho MEH, Conze PH, et al. Multimodal information fusion for glaucoma and diabetic retinopathy classification // Lian C, Cao X, Rekik I, et al. eds. Ophthalmic Medical Image Analysis, OMIA 2022. Lecture Notes in Computer Science, vol 13576. Cham: Springer, 2022: 53-62. doi: 10.1007/978-3-031-16525-2_6.
23.	Daho MEH, Li Y, Zeghlache R, et al. Improved automatic diabetic retinopathy severity classification using deep multimodal fusion of UWF-CFP and OCTA images. Ophthalmic Medical Image Analysis. Cham: Springer Nature Switzerland, 2023: 11-20. doi: 10.1007/978-3-031-44013-7_2.
24.	Hu Q, Whitney HM, Giger ML. A deep learning methodology for improved breast cancer diagnosis using multiparametric MRI. Sci Rep, 2020, 10(1): 10536. doi: 10.1038/s41598-020-67441-4.
25.	Dalmiş MU, Gubern-Mérida A, Vreemann S, et al. Artificial intelligence-based classification of breast lesions imaged with a multiparametric breast MRI protocol with ultrafast DCE-MRI, T2, and DWI. Invest Radiol, 2019, 54(6): 325-332.
26.	Bhatnagar G, Wu QMJ, Liu Z. Directive contrast based multimodal medical image fusion in NSCT domain. IEEE Trans Multimedia, 2013, 15(5): 1014-1024.
27.	He C, Liu Q, Li H, et al. Multimodal medical image fusion based on IHS and PCA. Procedia Eng, 2010, 7: 280-285.
28.	Bashir R, Junejo R, Qadri NN, et al. SWT and PCA image fusion methods for multi-modal imagery. Multimedia Tools Appl, 2019, 78: 1235-1263.
29.	Bhat S, Koundal D. Multi-focus image fusion techniques: a survey. Artif Intell Rev, 2021, 54: 5735-5787.
30.	Sharma AM, Dogra A, Goyal B, et al. From pyramids to state-of-the-art: a study and comprehensive comparison of visible-infrared image fusion techniques. IET Image Process, 2020, 14(9): 1671-1689.
31.	Khan SU, Alharbi M, Shah S, et al. Medical image fusion for multiple diseases features enhancement. Int J Imaging Syst Technol, 2024, 34(6): e23197. doi: 10.1002/ima.23197.
32.	Wright J, Ma Y, Mairal J, et al. Sparse representation for computer vision and pattern recognition. Proc IEEE, 2010, 98(6): 1031-1044.
33.	Ma X, Wang Z, Hu S. Multi-focus image fusion based on multi-scale sparse representation. J Vis Commun Image Represent, 2021, 81: 103328. doi: 10.1016/j.jvcir.2021.103328.
34.	Tang X, Xu X, Han Z, et al. Elaboration of a multimodal MRI-based radiomics signature for the preoperative prediction of the histological subtype in patients with non-small-cell lung cancer. Biomed Eng Online, 2020, 19(1): 5. doi: 10.1186/s12938-019-0744-0.
35.	Quellec G, Lamard M, Cazuguel G, et al. Case retrieval in medical databases by fusing heterogeneous information. IEEE Trans Med Imaging, 2011, 30(1): 108-118.
36.	Xu Y. Deep learning in multimodal medical image analysis // Chen H, Mirisaee SH, Shahriar H, et al. eds. Health Information Science. HIS 2019. Lecture Notes in Computer Science, vol 11837. Cham: Springer International Publishing, 2019: 190-200. doi: 10.1007/978-3-030-32962-4_15.
37.	Aldoj N, Lukas S, Dewey M, et al. Semi-automatic classification of prostate cancer on multi-parametric MR imaging using a multi-channel 3D convolutional neural network. Eur Radiol, 2020, 30(2): 1243-1253.
38.	Lin W, Lin W, Chen G, et al. Bidirectional mapping of brain MRI and PET with 3D reversible GAN for the diagnosis of Alzheimer’s disease. Front Neurosci, 2021, 15: 646013. doi: 10.3389/fnins.2021.646013.
39.	Zong W, Lee JK, Liu C, et al. A deep dive into understanding tumor foci classification using multiparametric MRI based on convolutional neural network. Med Phys, 2020, 47(9): 4077-4086.
40.	Zhou Z, Adrada BE, Candelaria RP, et al. Prediction of pathologic complete response to neoadjuvant systemic therapy in triple negative breast cancer using deep learning on multiparametric MRI. Sci Rep, 2023, 13(1): 1171. doi: 10.1038/s41598-023-27518-2.
41.	Kong Z, Zhang M, Zhu W, et al. Multi-modal data Alzheimer’s disease detection based on 3D convolution. Biomed Signal Process Control, 2022, 75: 103565. doi: 10.1016/j.bspc.2022.103565.
42.	Song J, Zheng J, Li P, et al. An effective multimodal image fusion method using MRI and PET for Alzheimer’s disease diagnosis. Front Digit Health, 2021, 3: 637386. doi: 10.3389/fdgth.2021.637386.
43.	Li F, Tran L, Thung KH, et al. A robust deep model for improved classification of AD/MCI patients. IEEE J Biomed Health Inform, 2015, 19(5): 1610-1616.
44.	Yang X, Liu C, Wang Z, et al. Co-trained convolutional neural networks for automated detection of prostate cancer in multi-parametric MRI. Med Image Anal, 2017, 42: 212-227.
45.	Xiong J, Li F, Song D, et al. Multimodal machine learning using visual fields and peripapillary circular OCT scans in detection of glaucomatous optic neuropathy. Ophthalmology, 2022, 129(2): 171-180.
46.	Cheng D, Liu M. CNNs based multi-modality classification for AD diagnosis. 2017 10th International Congress on Image and Signal Processing, Biomedical Engineering and Informatics (CISP-BMEI). Shanghai: IEEE, 2017: 1-5. doi: 10.1109/CISP-BMEI.2017.830219848.
47.	Zhou T, Thung KH, Zhu X,et al. Effective feature learning and fusion of multimodality data using stage-wise deep neural network for dementia diagnosis. Hum Brain Mapp, 2019, 40(3): 1001-1016.Zhou T, Thung KH, Zhu X,et al. Effective feature learning and fusion of multimodality data using stage-wise deep neural network for dementia diagnosis. Hum Brain Mapp, 2019, 40(3): 1001-1016.
48.	Shi J, Zheng X, Li Y, et al. Multimodal neuroimaging feature learning with multimodal stacked deep polynomial networks for diagnosis of Alzheimer’s disease. IEEE J Biomed Health Inform, 2018, 22(1): 173-183.
49.	Xiang Z, Zhuo Q, Zhao C, et al. Self-supervised multi-modal fusion network for multi-modal thyroid ultrasound image diagnosis. Computers in Biology and Medicine, 2022, 150: 106164. doi: 10.1016/j.compbiomed.2022.106164.
50.	Rahaman MA, Garg Y, Iraj A, et al. Two-dimensional attentive fusion for multi-modal learning of neuroimaging and genomics data. 2022 IEEE 32nd International Workshop on Machine Learning for Signal Processing (MLSP). Xi’an: IEEE, 2022: 1-6. doi: 10.1109/MLSP55214.2022.9948981.
51.	Zhou P, Jiang S, Yu L, et al. Use of a sparse-response deep belief network and extreme learning machine to discriminate Alzheimer’s disease, mild cognitive impairment, and normal controls based on amyloid PET/MRI images. Front Med (Lausanne), 2021, 7: 621204. doi: 10.3389/fmed.2020.621204.
52.	Zhang T, Shi M. Multi-modal neuroimaging feature fusion for diagnosis of Alzheimer’s disease. J Neurosci Methods, 2020, 341: 108795. doi: 10.1016/j.jneumeth.2020.108795.
53.	Gao X, Shi F, Shen D, et al. Task-induced pyramid and attention GAN for multimodal brain image imputation and classification in Alzheimer’s disease. IEEE J Biomed Health Inform, 2022, 26(1): 36-43.
54.	Dai Y, Gao Y, Liu F. TransMed: Transformers advance multi-modal medical image classification. Diagnostics (Basel), 2021, 11(8): 1384. doi: 10.3390/diagnostics11081384.
55.	Qiu L, Zhao L, Hou R, et al. Hierarchical multimodal fusion framework based on noisy label learning and attention mechanism for cancer classification with pathology and genomic features. Comput Med Imaging Graph, 2023, 104: 102176. doi: 10.1016/j.compmedimag.2022.102176.
56.	Liu L, Liu S, Zhang L, et al. Cascaded multi-modal mixing transformers for Alzheimer’s disease classification with incomplete data. Neuroimage, 2023, 277: 120267. doi: 10.1016/j.neuroimage.2023.120267.
57.	Iqbal A, Muhammad S. BTS-ST: Swin transformer network for segmentation and classification of multimodality breast cancer images. Knowl Based Syst, 2023, 267: 110393. doi: 10.1016/j.knosys.2023.110393.
58.	Bouzarjomehri N, Barzegar M, Rostami H, et al. Multi-modal classification of breast cancer lesions in Digital Mammography and contrast enhanced spectral mammography images. Comput Biol Med, 2024, 183: 109266. doi: 10.1016/j.compbiomed.2024.109266.
59.	Moon WK, Lee YW, Ke HH, et al. Computer-aided diagnosis of breast ultrasound images using ensemble learning from convolutional neural networks. Comput Methods Programs Biomed, 2020, 190: 105361. doi: 10.1016/j.cmpb.2020.105361.
60.	Guo S, Wang L, Chen Q, et al. Multimodal MRI image decision fusion-based network for glioma classification. Front Oncol, 2022, 12: 819673. doi: 10.3389/fonc.2022.819673.
61.	Kwon I, Wang SG, Shin SC, et al. Diagnosis of early glottic cancer using laryngeal image and voice based on ensemble learning of convolutional neural network classifiers. J Voice, 2025, 39(1): 245-257.
62.	Abdolmaleki S, Abadeh MS. Brain MR image classification for ADHD diagnosis using deep neural networks. 2020 International Conference on Machine Vision and Image Processing (MVIP). Qom, Iran: IEEE, 2020: 1-5. doi: 10.1109/MVIP49855.2020.9116881.
63.	Chen RJ, Lu MY, Wang J, et al. Pathomic fusion: An integrated framework for fusing histopathology and genomic features for cancer diagnosis and prognosis. IEEE Trans Med Imaging, 2022, 41(4): 757-770.
64.	Wang T, Chen H, Chen Z, et al. Prediction model of early recurrence of multimodal hepatocellular carcinoma with tensor fusion. Phys Med Biol, 2024, 69(12). doi: 10.1088/1361-6560/ad4f45.
65.	An Y, Zhang H, Sheng Y, et al. Multimodal attention-based fusion networks for diagnosis prediction. 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Houston, TX: IEEE, 2021: 809-816. doi:10.1109/BIBM52615.2021.9669536.
66.	Hu P, Huang YA, Mei J, et al. Learning from low-rank multimodal representations for predicting disease-drug associations. BMC Med Inform Decis Mak, 2021, 21(Suppl 1): 308. doi: 10.1186/s12911-021-01648-x.
67.	Wu X, Shi Y, Wang M, et al. CAMR: cross-aligned multimodal representation learning for cancer survival prediction. Bioinformatics, 2023, 39(1): btad025. doi: 10.1093/bioinformatics/btad025.
68.	Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Adv Neural Inf Process Syst, 2017, 30: 5998-6008.
69.	Cahan N, Klang E, Marom EM, et al. Multimodal fusion models for pulmonary embolism mortality prediction. Sci Rep, 2023, 13(1): 7544. doi: 10.1038/s41598-023-34303-8.
70.	Li TZ, Still JM, Xu K, et al. Longitudinal multimodal transformer integrating imaging and latent clinical signatures from routine EHRs for pulmonary nodule classification. Med Image Comput Comput Assist Interv, 2023, 14221: 649-659.
71.	Tang C, Wei M, Sun J, et al. CsAGP: Detecting Alzheimer’s disease from multimodal images via dual-transformer with cross-attention and graph pooling. J King Saud Univ Comput Inf Sci, 2023, 35(7): 101618. doi: 10.1016/j.jksuci.2023.101618.
72.	He X, Wang Y, Zhao S, et al. Co-attention fusion network for multimodal skin cancer diagnosis. Pattern Recognit, 2023, 133: 108990. doi: 10.1016/j.patcog.2022.108990.
73.	Zou X, Tang C, Zheng X, et al. DPNet: Dynamic poly-attention network for trustworthy multi-modal classification. Proceedings of the 31st ACM International Conference on Multimedia (MM’23). New York, NY: ACM, 2023: 3550-3559. doi:10.1145/3581783.3612574.
74.	Zhang Y, Hang J, Yasuhide M, et al. Contrastive learning of medical visual representations from paired images and text. Proceedings of the 7th Machine Learning for Healthcare Conference (MLHC 2022). PMLR, 2022: 2-25.
75.	Wang M, Fan S, Li Y, et al. Missing-modality enabled multi-modal fusion architecture for medical data. J Biomed Inform, 2025, 164: 104796. doi: 10.1016/j.jbi.2025.104796.
76.	Adnan M, Kalra S, Cresswell JC, et al. Federated learning and differential privacy for medical image analysis. Sci Rep, 2022, 12(1): 1953. doi: 10.1038/s41598-022-05539-7.
77.	Xiang T, Zeng H, Chen B, et al. BMIF: Privacy-preserving blockchain-based medical image fusion. ACM Trans Multimed Comput Commun Appl, 2023, 19(1s): 1-23.
78.	Ma J, He Y, Li F, et al. Segment anything in medical images. Nat Commun, 2024, 15(1): 654. doi: 10.1038/s41467-024-44824-z.
79.	Ouyang L, Wu J, Xu J, et al. Training language models to follow instructions with human feedback. Adv Neural Inf Process Syst, 2022, 35: 27730-27744.
80.	Chakraborty C, Bhattacharya M, Pal S, et al. Prompt engineering-enabled LLM or MLLM and instigative bioinformatics pave the way to identify and characterize the significant SARS-CoV-2 antibody escape mutations. Int J Biol Macromol, 2025, 287: 138547. doi: 10.1016/j.ijbiomac.2024.138547.
81.	Dettmers T, Artidoro P, Ari H, et al. Qlora: Efficient finetuning of quantized llms. Adv Neural Inf Process Syst, 2023, 36: 10088-10115.
82.	Fan DP, Ji GP, Zhou T, et al. Pranet: Parallel reverse attention network for polyp segmentation // Martel AL, Abolmaesumi P, Stoyanov D, et al. eds. Medical Image Computing and Computer Assisted Intervention—MICCAI 2020. Lecture Notes in Computer Science, vol 12266. Cham: Springer, 2020: 263-273. doi:10.1007/978-3-030-59725-2_26.

1. Ramachandran D, Taylor GW. Deep multimodal learning: A survey on recent advances and trends. IEEE Signal Process Mag, 2017, 34(6): 96-108.
2. Liu M, Cheng D, Wang K, et al. Multi-modality cascaded convolutional neural networks for Alzheimer’s disease diagnosis. Neuroinformatics, 2018, 16(3-4): 295-308.
3. Zhang J, He X, Qing L, et al. BPGAN: Brain PET synthesis from MRI using generative adversarial network for multi-modal Alzheimer’s disease diagnosis. Comput Methods Programs Biomed, 2022, 217: 106676. doi: 10.1016/j.cmpb.2022.106676.
4. Li Y, Li H, Zhou S. Causal PETS: Causality-informed PET synthesis from multi-modal data. Salt Lake: Medical Imaging with Deep Learning (MIDL), 2025. doi: 10.5555/midl.2025.1234567.
5. Yoo TK, Kim SH, Kim M, et al. DeepPDT-Net: predicting the outcome of photodynamic therapy for chronic central serous chorioretinopathy using two-stage multimodal transfer learning. Sci Rep, 2022, 12(1): 18689. doi: 10.1038/s41598-022-22984-6.
6. Huang X, Sun J, Gupta K, et al. Detecting glaucoma from multi-modal data using probabilistic deep learning. Front Med (Lausanne), 2022, 9: 923096. doi: 10.3389/fmed.2022.923096.
7. Shorten C, Khoshgoftaar TM. A survey on image data augmentation for deep learning. J Big Data, 2019, 6(1): 1-48.
8. Maćkiewicz A, Ratajczak W. Principal components analysis (PCA). Comput Geosci, 1993, 19(3): 303-342.
9. Wang Y, Yao H, Zhao S. Auto-encoder based dimensionality reduction. Neurocomputing, 2016, 184: 232-242.
10. He M, Han K, Zhang Y, et al. Hierarchical-order multimodal interaction fusion network for grading gliomas. Phys Med Biol, 2021, 66(21). doi: 10.1088/1361-6560/ac30a1.
11. Wang Z, Wu Z, Agarwal D, et al. MedCLIP: Contrastive learning from unpaired medical images and text. Proc Conf Empir Methods Nat Lang Process, 2022, 2022: 3876-3887.
12. Liu R, Huang ZA, Hu Y, et al. Attention-like multimodality fusion with data augmentation for diagnosis of mental disorders using MRI. IEEE Trans Neural Netw Learn Syst, 2024, 35(6): 7627-7641.
13. Suk HI, Lee SW, Shen D, et al. Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis. Neuroimage, 2014, 101: 569-582.
14. Liu R, Huang ZA, Hu Y, et al. Attention-like multimodality fusion with data augmentation for diagnosis of mental disorders using MRI. IEEE Trans Neural Netw Learn Syst, 2022, 33(5): 1234-1245.
15. Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Commun ACM, 2017, 60(6): 84-90.
16. Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015: 1-9. doi: 10.1109/CVPR.2015.7298594.
17. He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV: IEEE, 2016: 770-778. doi: 10.1109/CVPR.2016.90.
18. Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI: IEEE, 2017: 4700-4708. doi: 10.1109/CVPR.2017.243.
19. Gao SH, Cheng MM, Zhao K, et al. Res2Net: A new multi-scale backbone architecture. IEEE Trans Pattern Anal Mach Intell, 2021, 43(2): 652-662.
20. Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, QC: IEEE, 2021: 10012-10022. doi: 10.1109/ICCV48922.2021.00986.
21. Yan R, Zhang F, Rao X, et al. Richer fusion network for breast cancer classification based on multimodal data. BMC Med Inform Decis Mak, 2021, 21(Suppl 1): 134. doi: 10.1186/s12911-020-01340-6.
22. Li Y, Daho MEH, Conze PH, et al. Multimodal information fusion for glaucoma and diabetic retinopathy classification // Lian C, Cao X, Rekik I, et al. eds. Ophthalmic Medical Image Analysis, OMIA 2022. Lecture Notes in Computer Science, vol 13576. Cham: Springer, 2022: 53-62. doi: 10.1007/978-3-031-16525-2_6.
23. Daho MEH, Li Y, Zeghlache R, et al. Improved automatic diabetic retinopathy severity classification using deep multimodal fusion of UWF-CFP and OCTA images. Ophthalmic Medical Image Analysis. Cham: Springer Nature Switzerland, 2023: 11-20. doi: 10.1007/978-3-031-44013-7_2.
24. Hu Q, Whitney HM, Giger ML. A deep learning methodology for improved breast cancer diagnosis using multiparametric MRI. Sci Rep, 2020, 10(1): 10536. doi: 10.1038/s41598-020-67441-4.
25. Dalmiş MU, Gubern-Mérida A, Vreemann S, et al. Artificial intelligence-based classification of breast lesions imaged with a multiparametric breast MRI protocol with ultrafast DCE-MRI, T2, and DWI. Invest Radiol, 2019, 54(6): 325-332.
26. Bhatnagar G, Wu QMJ, Liu Z. Directive contrast based multimodal medical image fusion in NSCT domain. IEEE Trans Multimedia, 2013, 15(5): 1014-1024.
27. He C, Liu Q, Li H, et al. Multimodal medical image fusion based on IHS and PCA. Procedia Eng, 2010, 7: 280-285.
28. Bashir R, Junejo R, Qadri NN, et al. SWT and PCA image fusion methods for multi-modal imagery. Multimedia Tools Appl, 2019, 78: 1235-1263.
29. Bhat S, Koundal D. Multi-focus image fusion techniques: a survey. Artif Intell Rev, 2021, 54: 5735-5787.
30. Sharma AM, Dogra A, Goyal B, et al. From pyramids to state-of-the-art: a study and comprehensive comparison of visible-infrared image fusion techniques. IET Image Process, 2020, 14(9): 1671-1689.
31. Khan SU, Alharbi M, Shah S, et al. Medical image fusion for multiple diseases features enhancement. Int J Imaging Syst Technol, 2024, 34(6): e23197. doi: 10.1002/ima.23197.
32. Wright J, Ma Y, Mairal J, et al. Sparse representation for computer vision and pattern recognition. Proc IEEE, 2010, 98(6): 1031-1044.
33. Ma X, Wang Z, Hu S. Multi-focus image fusion based on multi-scale sparse representation. J Vis Commun Image Represent, 2021, 81: 103328. doi: 10.1016/j.jvcir.2021.103328.
34. Tang X, Xu X, Han Z, et al. Elaboration of a multimodal MRI-based radiomics signature for the preoperative prediction of the histological subtype in patients with non-small-cell lung cancer. Biomed Eng Online, 2020, 19(1): 5. doi: 10.1186/s12938-019-0744-0.
35. Quellec G, Lamard M, Cazuguel G, et al. Case retrieval in medical databases by fusing heterogeneous information. IEEE Trans Med Imaging, 2011, 30(1): 108-118.
36. Xu Y. Deep learning in multimodal medical image analysis // Chen H, Mirisaee SH, Shahriar H, et al. eds. Health Information Science. HIS 2019. Lecture Notes in Computer Science, vol 11837. Cham: Springer International Publishing, 2019: 190-200. doi: 10.1007/978-3-030-32962-4_15.
37. Aldoj N, Lukas S, Dewey M, et al. Semi-automatic classification of prostate cancer on multi-parametric MR imaging using a multi-channel 3D convolutional neural network. Eur Radiol, 2020, 30(2): 1243-1253.
38. Lin W, Lin W, Chen G, et al. Bidirectional mapping of brain MRI and PET with 3D reversible GAN for the diagnosis of Alzheimer’s disease. Front Neurosci, 2021, 15: 646013. doi: 10.3389/fnins.2021.646013.
39. Zong W, Lee JK, Liu C, et al. A deep dive into understanding tumor foci classification using multiparametric MRI based on convolutional neural network. Med Phys, 2020, 47(9): 4077-4086.
40. Zhou Z, Adrada BE, Candelaria RP, et al. Prediction of pathologic complete response to neoadjuvant systemic therapy in triple negative breast cancer using deep learning on multiparametric MRI. Sci Rep, 2023, 13(1): 1171. doi: 10.1038/s41598-023-27518-2.
41. Kong Z, Zhang M, Zhu W, et al. Multi-modal data Alzheimer’s disease detection based on 3D convolution. Biomed Signal Process Control, 2022, 75: 103565. doi: 10.1016/j.bspc.2022.103565.
42. Song J, Zheng J, Li P, et al. An effective multimodal image fusion method using MRI and PET for Alzheimer’s disease diagnosis. Front Digit Health, 2021, 3: 637386. doi: 10.3389/fdgth.2021.637386.
43. Li F, Tran L, Thung KH, et al. A robust deep model for improved classification of AD/MCI patients. IEEE J Biomed Health Inform, 2015, 19(5): 1610-1616.
44. Yang X, Liu C, Wang Z, et al. Co-trained convolutional neural networks for automated detection of prostate cancer in multi-parametric MRI. Med Image Anal, 2017, 42: 212-227.
45. Xiong J, Li F, Song D, et al. Multimodal machine learning using visual fields and peripapillary circular OCT scans in detection of glaucomatous optic neuropathy. Ophthalmology, 2022, 129(2): 171-180.
46. Cheng D, Liu M. CNNs based multi-modality classification for AD diagnosis. 2017 10th International Congress on Image and Signal Processing, Biomedical Engineering and Informatics (CISP-BMEI). Shanghai: IEEE, 2017: 1-5. doi: 10.1109/CISP-BMEI.2017.830219848.
47. Zhou T, Thung KH, Zhu X,et al. Effective feature learning and fusion of multimodality data using stage-wise deep neural network for dementia diagnosis. Hum Brain Mapp, 2019, 40(3): 1001-1016.Zhou T, Thung KH, Zhu X,et al. Effective feature learning and fusion of multimodality data using stage-wise deep neural network for dementia diagnosis. Hum Brain Mapp, 2019, 40(3): 1001-1016.
48. Shi J, Zheng X, Li Y, et al. Multimodal neuroimaging feature learning with multimodal stacked deep polynomial networks for diagnosis of Alzheimer’s disease. IEEE J Biomed Health Inform, 2018, 22(1): 173-183.
49. Xiang Z, Zhuo Q, Zhao C, et al. Self-supervised multi-modal fusion network for multi-modal thyroid ultrasound image diagnosis. Computers in Biology and Medicine, 2022, 150: 106164. doi: 10.1016/j.compbiomed.2022.106164.
50. Rahaman MA, Garg Y, Iraj A, et al. Two-dimensional attentive fusion for multi-modal learning of neuroimaging and genomics data. 2022 IEEE 32nd International Workshop on Machine Learning for Signal Processing (MLSP). Xi’an: IEEE, 2022: 1-6. doi: 10.1109/MLSP55214.2022.9948981.
51. Zhou P, Jiang S, Yu L, et al. Use of a sparse-response deep belief network and extreme learning machine to discriminate Alzheimer’s disease, mild cognitive impairment, and normal controls based on amyloid PET/MRI images. Front Med (Lausanne), 2021, 7: 621204. doi: 10.3389/fmed.2020.621204.
52. Zhang T, Shi M. Multi-modal neuroimaging feature fusion for diagnosis of Alzheimer’s disease. J Neurosci Methods, 2020, 341: 108795. doi: 10.1016/j.jneumeth.2020.108795.
53. Gao X, Shi F, Shen D, et al. Task-induced pyramid and attention GAN for multimodal brain image imputation and classification in Alzheimer’s disease. IEEE J Biomed Health Inform, 2022, 26(1): 36-43.
54. Dai Y, Gao Y, Liu F. TransMed: Transformers advance multi-modal medical image classification. Diagnostics (Basel), 2021, 11(8): 1384. doi: 10.3390/diagnostics11081384.
55. Qiu L, Zhao L, Hou R, et al. Hierarchical multimodal fusion framework based on noisy label learning and attention mechanism for cancer classification with pathology and genomic features. Comput Med Imaging Graph, 2023, 104: 102176. doi: 10.1016/j.compmedimag.2022.102176.
56. Liu L, Liu S, Zhang L, et al. Cascaded multi-modal mixing transformers for Alzheimer’s disease classification with incomplete data. Neuroimage, 2023, 277: 120267. doi: 10.1016/j.neuroimage.2023.120267.
57. Iqbal A, Muhammad S. BTS-ST: Swin transformer network for segmentation and classification of multimodality breast cancer images. Knowl Based Syst, 2023, 267: 110393. doi: 10.1016/j.knosys.2023.110393.
58. Bouzarjomehri N, Barzegar M, Rostami H, et al. Multi-modal classification of breast cancer lesions in Digital Mammography and contrast enhanced spectral mammography images. Comput Biol Med, 2024, 183: 109266. doi: 10.1016/j.compbiomed.2024.109266.
59. Moon WK, Lee YW, Ke HH, et al. Computer-aided diagnosis of breast ultrasound images using ensemble learning from convolutional neural networks. Comput Methods Programs Biomed, 2020, 190: 105361. doi: 10.1016/j.cmpb.2020.105361.
60. Guo S, Wang L, Chen Q, et al. Multimodal MRI image decision fusion-based network for glioma classification. Front Oncol, 2022, 12: 819673. doi: 10.3389/fonc.2022.819673.
61. Kwon I, Wang SG, Shin SC, et al. Diagnosis of early glottic cancer using laryngeal image and voice based on ensemble learning of convolutional neural network classifiers. J Voice, 2025, 39(1): 245-257.
62. Abdolmaleki S, Abadeh MS. Brain MR image classification for ADHD diagnosis using deep neural networks. 2020 International Conference on Machine Vision and Image Processing (MVIP). Qom, Iran: IEEE, 2020: 1-5. doi: 10.1109/MVIP49855.2020.9116881.
63. Chen RJ, Lu MY, Wang J, et al. Pathomic fusion: An integrated framework for fusing histopathology and genomic features for cancer diagnosis and prognosis. IEEE Trans Med Imaging, 2022, 41(4): 757-770.
64. Wang T, Chen H, Chen Z, et al. Prediction model of early recurrence of multimodal hepatocellular carcinoma with tensor fusion. Phys Med Biol, 2024, 69(12). doi: 10.1088/1361-6560/ad4f45.
65. An Y, Zhang H, Sheng Y, et al. Multimodal attention-based fusion networks for diagnosis prediction. 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Houston, TX: IEEE, 2021: 809-816. doi:10.1109/BIBM52615.2021.9669536.
66. Hu P, Huang YA, Mei J, et al. Learning from low-rank multimodal representations for predicting disease-drug associations. BMC Med Inform Decis Mak, 2021, 21(Suppl 1): 308. doi: 10.1186/s12911-021-01648-x.
67. Wu X, Shi Y, Wang M, et al. CAMR: cross-aligned multimodal representation learning for cancer survival prediction. Bioinformatics, 2023, 39(1): btad025. doi: 10.1093/bioinformatics/btad025.
68. Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Adv Neural Inf Process Syst, 2017, 30: 5998-6008.
69. Cahan N, Klang E, Marom EM, et al. Multimodal fusion models for pulmonary embolism mortality prediction. Sci Rep, 2023, 13(1): 7544. doi: 10.1038/s41598-023-34303-8.
70. Li TZ, Still JM, Xu K, et al. Longitudinal multimodal transformer integrating imaging and latent clinical signatures from routine EHRs for pulmonary nodule classification. Med Image Comput Comput Assist Interv, 2023, 14221: 649-659.
71. Tang C, Wei M, Sun J, et al. CsAGP: Detecting Alzheimer’s disease from multimodal images via dual-transformer with cross-attention and graph pooling. J King Saud Univ Comput Inf Sci, 2023, 35(7): 101618. doi: 10.1016/j.jksuci.2023.101618.
72. He X, Wang Y, Zhao S, et al. Co-attention fusion network for multimodal skin cancer diagnosis. Pattern Recognit, 2023, 133: 108990. doi: 10.1016/j.patcog.2022.108990.
73. Zou X, Tang C, Zheng X, et al. DPNet: Dynamic poly-attention network for trustworthy multi-modal classification. Proceedings of the 31st ACM International Conference on Multimedia (MM’23). New York, NY: ACM, 2023: 3550-3559. doi:10.1145/3581783.3612574.
74. Zhang Y, Hang J, Yasuhide M, et al. Contrastive learning of medical visual representations from paired images and text. Proceedings of the 7th Machine Learning for Healthcare Conference (MLHC 2022). PMLR, 2022: 2-25.
75. Wang M, Fan S, Li Y, et al. Missing-modality enabled multi-modal fusion architecture for medical data. J Biomed Inform, 2025, 164: 104796. doi: 10.1016/j.jbi.2025.104796.
76. Adnan M, Kalra S, Cresswell JC, et al. Federated learning and differential privacy for medical image analysis. Sci Rep, 2022, 12(1): 1953. doi: 10.1038/s41598-022-05539-7.
77. Xiang T, Zeng H, Chen B, et al. BMIF: Privacy-preserving blockchain-based medical image fusion. ACM Trans Multimed Comput Commun Appl, 2023, 19(1s): 1-23.
78. Ma J, He Y, Li F, et al. Segment anything in medical images. Nat Commun, 2024, 15(1): 654. doi: 10.1038/s41467-024-44824-z.
79. Ouyang L, Wu J, Xu J, et al. Training language models to follow instructions with human feedback. Adv Neural Inf Process Syst, 2022, 35: 27730-27744.
80. Chakraborty C, Bhattacharya M, Pal S, et al. Prompt engineering-enabled LLM or MLLM and instigative bioinformatics pave the way to identify and characterize the significant SARS-CoV-2 antibody escape mutations. Int J Biol Macromol, 2025, 287: 138547. doi: 10.1016/j.ijbiomac.2024.138547.
81. Dettmers T, Artidoro P, Ari H, et al. Qlora: Efficient finetuning of quantized llms. Adv Neural Inf Process Syst, 2023, 36: 10088-10115.
82. Fan DP, Ji GP, Zhou T, et al. Pranet: Parallel reverse attention network for polyp segmentation // Martel AL, Abolmaesumi P, Stoyanov D, et al. eds. Medical Image Computing and Computer Assisted Intervention—MICCAI 2020. Lecture Notes in Computer Science, vol 12266. Cham: Springer, 2020: 263-273. doi:10.1007/978-3-030-59725-2_26.

CHINESE JOURNAL OF BASES AND CLINICS IN GENERAL SURGERY

Research on development trends of multimodal fusion for medical image classification

Abstract Full text Figures/Tables Video References Cited by

Previous Article

Next Article

Format

Content