• 1. School of Computer Engineering and Science, Shanghai University, Shanghai 200444, P. R. China;
  • 2. Shanghai Institute for Advanced Communication and Data Science, Shanghai University, Shanghai 200444, P. R. China;
  • 3. Medical College of Shanghai University, Shanghai University, Shanghai 200444, P. R. China;
  • 4. Shanghai Universal Medical Imaging Diagnostic Center, Shanghai 200233, P. R. China;
  • 5. Gastroenterology, Shanghai Sixth People’s Hospital Affiliated to Shanghai Jiaotong University, Shanghai 200233, P. R. China;
  • 6. Faculty of Science, Yamaguchi University, Yamaguchi-Ken 753-8511, Japan;
  • 7. College of Information Science and Engineering, Ritsumeikan University, Shiga-Ken 525-8577, Japan;
WU Xing, Email: xingwu@shu.edu.cn
Export PDF Favorites Scan Get Citation

Retrieving keyframes most relevant to text from small intestine videos with given labels can efficiently and accurately locate pathological regions. However, training directly on raw video data is extremely slow, while learning visual representations from image-text datasets leads to computational inconsistency. To tackle this challenge, a small bowel video keyframe retrieval based on multi-modal contrastive learning (KRCL) is proposed. This framework fully utilizes textual information from video category labels to learn video features closely related to text, while modeling temporal information within a pretrained image-text model. It transfers knowledge learned from image-text multimodal models to the video domain, enabling interaction among medical videos, images, and text data. Experimental results on the hyper-spectral and Kvasir dataset for gastrointestinal disease detection (Hyper-Kvasir) and the Microsoft Research video-to-text (MSR-VTT) retrieval dataset demonstrate the effectiveness and robustness of KRCL, with the proposed method achieving state-of-the-art performance across nearly all evaluation metrics.

Copyright © the editorial department of Journal of Biomedical Engineering of West China Medical Publisher. All rights reserved