Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training (MS-CLIP)

This repo contains the source code of our ECCV 2022 paper MS-CLIP:

Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
2022 European Conference on Computer Vision (ECCV 2022)
By Haoxuan You*, Luowei Zhou*, Bin Xiao*, Noel Codella*, Yu Cheng, Ruochen Xu, Shih-Fu Chang, Lu Yuan.

Introduction

We investigate a variety of Modality-Shared Contrastive Language-Image Pre-training (MS-CLIP) frameworks. More specifically, we question how many parameters of a transformer model can be shared across modalities during contrastive pre-training, and rigorously examine architectural design choices that position the proportion of parameters shared along a spectrum. In studied conditions, we observe that a mostly unified encoder for vision and language signals outperforms all other variations that separate more parameters. Additionally, we find that lightweight modality-specific parallel modules further improve performance.

Update

[07/20/2022] Released pretrained model and zero-shot evaluation on ImageNet-1k.

Pre-trained Weights

Model	Training Set	Top-1 on IN-1K	LP* on 24 datasets	Download
MS-CLIP-S (ViT-B/32)	YFCC-22M	36.7	68.5	ckpt/config
MS-CLIP-S (ViT-B/16)	YFCC-22M	39.0	70.4	ckpt/config
MS-CLIP-S (ViT-B/32)	LAION-20M	40.2	73.3	ckpt/config

*LP: Linear Probing

Getting Started

Installation

Please follow INSTALL.md for installation

Data preparation

Please follow DATA.md for data preparation.

Pre-trained weights preparation

Download from the links in the table above. Put the weights under ./OUTPUT_MODEL/.

Evaluation

To evaluate a pre-trained MS-CLIP-S on ImageNet Zero-shot Classification, run:

CUDA_VISIBLE_DEVICES=0 python tools/eval_zeroshot.py --model <config-file>

where <config-file> is the config yaml under experiments/model/. E.g. experiments/model/b32-laion-msclips.yaml

Contact

If you have any questions, please contact Haoxuan You or Luowei Zhou.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
DATASET		DATASET
Figs		Figs
experiments		experiments
lib		lib
tools		tools
.DS_Store		.DS_Store
.gitignore		.gitignore
INSTALL.md		INSTALL.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training (MS-CLIP)

Introduction

Update

Pre-trained Weights

Getting Started

Installation

Data preparation

Pre-trained weights preparation

Evaluation

Contact

About

Releases

Packages

Contributors 2

Languages

Hxyou/MSCLIP

Folders and files

Latest commit

History

Repository files navigation

Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training (MS-CLIP)

Introduction

Update

Pre-trained Weights

Getting Started

Installation

Data preparation

Pre-trained weights preparation

Evaluation

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages