Decoupling and Aligning: A New Paradigm for Computationally Efficient Prompt Learning

Highlights

Prompt learning, as a parameter-efficient fine-tuning paradigm, has emerged as a trend in adapting large pre-trained vision-language models (VLMs) to downstream tasks. However, most existing methods, like CoCoOp and KgCoOp, require converting category names from specific tasks into textual descriptions as text prompt inputs, resulting in the computational cost of the text encoder increasing in direct proportion to the number of categories in the downstream task. To address this challenge, we propose a novel Computationally Efficient Prompt Learning (CEPL) method, which showcases remarkable performance improvement while significantly reducing computation cost. Our CEPL involves the following two key points. 1) Boosted computation efficiency. We propose a textual prompt decoupling module (TPD) that decouples category names from text prompts by learning an image-conditioned text prompt, rather than directly embedding the complete category names. 2) Enhanced tuning effectiveness. We introduce a semantic alignment adaptation module (SAA) which fine-tunes original image features by optimizing task-specific and task-agnostic losses, so that image features are not only aligned with semantic-level text but also adaptable to downstream tasks. Extensive experiments demonstrate that our CEPL achieves superior classification performance at extremely low computational overhead. In particular, CEPL reduces GFLOPs by 95% compared to the state-of-the-art KgCoOp, and yields an average accuracy improvement of 7.57% across 16-shot classifications in 11 datasets.

Main Contributions

We present Computationally Efficient Prompt Learning (CEPL), a novel approach that not only achieves remarkable improvements in performance across various tasks but also significantly reduces computational costs. This dual focus on efficiency and effectiveness makes CEPL a compelling solution for adapting large vision-language models to diverse downstream applications.

Model Zoo

16-shot Classification Task On the 11 datasets

Dataset	CEPL Acc.	Log/CEPL
ImageNet	73.47	Link
Caltech101	97.04	Link
OxfordPets	93.81	Link
StanfordCars	89.27	Link
Flowers102	98.71	Link
Food101	86.61	Link
FGVCAircraft	62.38	Link
SUN397	76.25	Link
DTD	75.85	Link
EuroSAT	94.08	Link
UCF101	86.90	Link
Average	84.94

How to Install

This code is built on top of the awesome toolbox Dassl.pytorch so you need to install the dassl environment first. Simply follow the instructions described here to install dassl as well as PyTorch. After that, run pip install -r requirements.txt under CEPL_Code/ to install a few more packages required by CLIP (this should be done when dassl is activated). You are now ready to begin.

How to Run

Few-Shot Learning

This section corresponds to the experiments described in Section 3.1, Few-shot classification.

Step 1: Generate Text Features:

Configure CEPL_Code/exp/cross_modal_engine/config/default.py to specify the paths for text features and the dataset. Then, run CEPL_Code/get_text_feature.sh to generate and save the text features. Update the text feature path in CEPL_Code/get_linear_head_weight.py with the generated text feature path.

Step 2: Train the Model:

You will need CEPL_Code/bash.sh . The bash train and evaluate the model on all classes. Both scripts have three input arguments, i.e., DATASET, EPOCH and SEED. DATASET takes as input a dataset name, like imagenet or caltech101. The valid names are the files names in CEPL_Code/configs/datasets/.

Use CEPL_Code/bash.sh to train and evaluate the model on all classes. The script accepts three input arguments: DATASET, EPOCH and SEED.

DATASET: Name of the dataset (e.g., imagenet, caltech101). Valid names are listed in CEPL_Code/configs/datasets/.

Example: Evaluate the Model on Stanford_Cars

# seed=1
bash scripts/efficient_prompts/xd_train.sh  stanford_cars 50 1 
bash scripts/efficient_prompts/xd_test.sh  stanford_cars 50 1

Output Structure

After executing the above commands, the output will be organized as follows:

output
|–– CEPL/
|   |–– stanford_cars/
|   |   |–– vit_b16_cepl_16shots/
|   |   |   |–– seed1/
output
|–– evaluation/
|   |–– stanford_cars/
|   |   |–– vit_b16_cepl_16shots/
|   |   |   |–– seed1/

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Acknowledgement

We would like to thank the authors of KgCoOp and CoOp, based on which this codebase was built.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
CEPL_Code		CEPL_Code
CEPL_logs		CEPL_logs
imgs		imgs
DATASETS.md		DATASETS.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Decoupling and Aligning: A New Paradigm for Computationally Efficient Prompt Learning

Highlights

Main Contributions

Model Zoo

How to Install

How to Run

Few-Shot Learning

Step 1: Generate Text Features:

Step 2: Train the Model:

Example: Evaluate the Model on Stanford_Cars

Output Structure

License

Acknowledgement

About

Releases

Packages

Languages

License

FengJY99/CEPL

Folders and files

Latest commit

History

Repository files navigation

Decoupling and Aligning: A New Paradigm for Computationally Efficient Prompt Learning

Highlights

Main Contributions

Model Zoo

How to Install

How to Run

Few-Shot Learning

Step 1: Generate Text Features:

Step 2: Train the Model:

Example: Evaluate the Model on Stanford_Cars

Output Structure

License

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages