text-only training or language-free training for multimodal tasks (image/audio/video caption, retrieval, text2image)
awesome image-captioning zero-shot video-captioning text2image audio-captioning composed-image-retrieval text-only-supervision text-only-training language-free-training
-
Updated
Oct 15, 2024