MIRAGE: Retrieval and Generation of Multimodal Images and Texts for Medical Education

Access to diverse, well-annotated medical images with interactive learning tools is fundamental for training practitioners in medicine and related fields to improve their diagnostic skills and understanding of anatomical structures. While medical atlases are valuable, they are often impractical due to their size and lack of interactivity, whereas online image search may provide mislabeled or incomplete material. To address this, we propose MIRAGE, a multimodal medical text and image retrieval and generation system that allows users to find and generate clinically relevant images from trustworthy sources by mapping both text and images to a shared latent space, enabling semantically meaningful queries. The system is based on a fine-tuned medical version of CLIP (MedICaT-ROCO), trained with the ROCO dataset, obtained from PubMed Central. MIRAGE allows users to give prompts to retrieve images, generate synthetic ones through a medical diffusion model (Prompt2MedImage) and receive enriched descriptions from a large language model (Dolly-v2-3b). It also supports a dual search option, enabling the visual comparison of different medical conditions. A key advantage of the system is that it relies entirely on publicly available pretrained models, ensuring reproducibility and accessibility. Our goal is to provide a free, transparent and easy-to-use didactic tool for medical students, especially those without programming skills. The system features an interface that enables interactive and personalized visual learning through medical image retrieval and generation. The system is accessible to medical students worldwide without requiring local computational resources or technical expertise, and is currently deployed on Kaggle: http://www-vpu.eps.uam.es/mirage

Main Contributions

Unified multimodal system.
Combines medical image retrieval, text-based description, concept-level comparison and synthetic image generation in a single pipeline.

Accessible and reproducible design. Built entirely from publicly available pretrained models and deployed on Kaggle, making it usable without programming skills or local resources.

Semantic consistency validation. Evaluated embedding-space alignment and demonstrated coherent retrieval and generation in realistic educational scenarios.

Link to the publication: - on press -

Reference:

@inproceedings{diaz25,
title={MIRAGE: Retrieval and Generation of Multimodal Images and Texts for Medical Education},
author={Díaz Benito, Miguel and Diana-Albelda, Cecilia and García-Martín, Álvaro and Bescos, Jesus and Escudero Viñolo, Marcos and SanMiguel, Juan Carlos},
booktitle={International Workshop on Applications of Medical AI},
pages={},
year={2025},
organization={Springer}
}