Prompt, Generate, then Cache

Official implementation of 'Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners'.

The paper has been accepted by CVPR 2023 🔥.

News

Please check our latest work 'Point-NN, Parameter is Not All You Need' with code, accepted by CVPR 2023 🔥, which conducts 3D understanding without ant parameters or training.
CaFo cascaded with ChatGPT and Stable Diffusion on Caltech-101 dataset has been released 📌.
The code of CaFo has been released.
The CaFo model is developed based on Tip-Adapter, accepted by ECCV 2022 and open-sourced.

Introduction

We propose CaFo, a Cascade of Foundation models that incorporates diverse prior knowledge of various pre-trianing paradigms for better few-shot learning, including CLIP, DINO, DALL-E, and GPT-3. Specifically, CaFo works by `Prompt, Generate, then Cache'. We leverage GPT-3 to prompt CLIP with rich linguistic semantics and generate synthetic images via DALL-E to expand the few-shot training data. Then, we introduce a learnable cache model to adaptively blend the predictions from CLIP and DINO. By such collaboration, CaFo can fully unleash the potential of different pre-training methods and unify them to perform state-of-the-art for few-shot classification.

Requirements

Installation

Create a conda environment and install dependencies:

git clone https://github.com/ZrrSkywalker/CaFo.git
cd CaFo

conda create -n cafo python=3.7
conda activate cafo

pip install -r requirements.txt

# Install the according versions of torch and torchvision
conda install pytorch torchvision cudatoolkit

Dataset

Please follow DATASET.md to download official ImageNet and other 10 datasets.

Foundation Models

The pre-tained weights of CLIP will be automatically downloaded by running.
The prompts produced by GPT-3 have been stored at gpt_file/.
Please download DINO's pre-trained ResNet-50 from here, and put it under dino/.
Please download DALL-E's generated images from here, and organize them with the official datasets like

$DATA/
|–– imagenet/
|–– caltech-101/
|–– oxford_pets/
|–– ...
|–– dalle_imagenet/
|–– dalle_caltech-101/
|–– dalle_oxford_pets/
|–– ...
|–– sd_caltech-101/

For Caltech-101 dataset, we also provide Stable Diffusion's images from here, and ChatGPT's prompts in gpt_file/.

Get Started

Configs

The running configurations for different [dataset] with [k] shots can be modified in configs/[dataset]/[k]shot.yaml, including visual encoders and hyperparamters. We have provided the configurations for reproducing the results in the paper. You can edit the search_scale, search_step, init_beta and init_alpha for fine-grained tuning and better results.

Note that the default load_cache and load_pre_feat are False for the first running, which will store the cache model and val/test features in configs/dataset/. For later running, they can be set as True for faster hyperparamters tuning.

For Caltech101 dataset, the config of Stable Diffusion's images and ChatGPT's prompts is respectively in configs/sd_caltech101 and configs/chat_caltech101.

Running

For 16-shot ImageNet dataset:

CUDA_VISIBLE_DEVICES=0 python main_imagenet.py --config configs/imagenet/16shot.yaml

For other 10 datasets:

CUDA_VISIBLE_DEVICES=0 python main.py --config configs/dataset/16shot.yaml

Numerical Results

We provide CaFo's numerical results on 11 datasets from 1 to 16 shots at exp_Cafo.log. The results for Tip-Adapter and Tip-Adapter-F is at exp_Tip.log.

Acknowledgement

This repo benefits from Tip-Adapter, CLIP, DINO, DALL-E and CuPL. Thanks for their wonderful works.

Citation

@article{zhang2023prompt,
  title={Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners},
  author={Renrui Zhang and Xiangfei Hu and Bohao Li and Siyuan Huang and Hanqiu Deng and Hongsheng Li and Yu Qiao and Peng Gao},
  journal={arXiv preprint arXiv:2303.02151},
  year={2023}
}

Contributors

Renrui Zhang, Xiangfei Hu, Bohao Li

Contact

If you have any question about this project, please feel free to contact zhangrenrui@pjlab.org.cn and sjtuhxf@sjtu.edu.cn.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
clip		clip
configs		configs
datasets		datasets
dino		dino
gpt_file		gpt_file
CaFo.png		CaFo.png
CaFo_arXiv.pdf		CaFo_arXiv.pdf
LICENSE		LICENSE
README.md		README.md
exp.log		exp.log
main.py		main.py
main_imagenet.py		main_imagenet.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prompt, Generate, then Cache

News

Introduction

Requirements

Installation

Dataset

Foundation Models

Get Started

Configs

Running

Numerical Results

Acknowledgement

Citation

Contributors

Contact

About

Releases

Packages

Contributors 2

Languages

License

OpenGVLab/CaFo

Folders and files

Latest commit

History

Repository files navigation

Prompt, Generate, then Cache

News

Introduction

Requirements

Installation

Dataset

Foundation Models

Get Started

Configs

Running

Numerical Results

Acknowledgement

Citation

Contributors

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages