CLIP on low-resource vision

Overview

This project addresses the long-tailed distribution problem in vision-language models, specifically focusing on improving CLIP (Contrastive Language-Image Pre-training) performance in low-resource learning scenarios.

The research explores various adaptation techniques to enhance model performance on datasets with imbalanced class distributions.

The project is part of Trends & Applications of Computer Vision course. MSc in Artificial Intelligence Systems at University of Trento.

Key Features

Adaptation Techniques

Low-Rank Adaptation (LoRA)
- Efficiently tunes transformer layers
- Reduces trainable parameters
- Preserves model speed and computational efficiency
Bias-terms Fine-tuning (BitFit)
- Adjusts model bias terms
- Exposes existing model knowledge
- Keeps most parameters frozen
Meta-Adapter
- Facilitates online adaptation with minimal examples
- Uses cross-attention for feature alignment
- Implements meta-learning approach
Label Preserving & Breaking Data Augmentation
- Generates augmented training data using Stable Diffusion
- Creates label-preserving and label-breaking images
- Introduces diversity while maintaining semantic integrity

Experimental Datasets

Installation

# Clone the repository
git clone https://github.com/your-username/low-resource-vision-clip.git

# Create virtual environment
python -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

Usage

# Example training command
python main.py --dataset circuits 
               --root_path data/circuits/
               --shots 16 
               --enable_lora 
               --enable_BitFit 
               --enable_MetaAdapter
               --enable_breaking_loss

Key Arguments

--dataset: Choose dataset (e.g., 'eurosat', 'circuits')
--root_path: Path of the data
--shots: Number of few-shot examples
--backbone: CLIP model backbone (default: 'ViT-B/16')
--enable_lora: Enable Low-Rank Adaptation
--enable_BitFit: Enable Bias-terms Fine-tuning
--enable_MetaAdapter: Enable Meta-Adapter
--enable_breaking_loss: Enable Breaking Loss

Citation

If you use this work in your research, please cite:

@article{Lorenzi2024LowResourceVision,
  title={CLIP on Low-Resource Vision},
  author={Lorenzi, Alessandro and Cazzola, Luca and Facchini, Omar},
  year={2024},
  institution={University of Trento}
}

Authors

Alessandro Lorenzi
Luca Cazzola
Omar Facchini

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
data		data
datasets		datasets
modules		modules
.gitattributes		.gitattributes
.gitignore		.gitignore
DATASETS.md		DATASETS.md
README.md		README.md
csvToJsonSplit.py		csvToJsonSplit.py
failure_case_analysis.py		failure_case_analysis.py
generate_label_maps.py		generate_label_maps.py
main.py		main.py
plot_improvements.py		plot_improvements.py
runner.sh		runner.sh
runner_test.sh		runner_test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CLIP on low-resource vision

Overview

Key Features

Adaptation Techniques

Experimental Datasets

Installation

Usage

Key Arguments

Citation

Authors

About

Releases

Packages

Contributors 3

Languages

OmarFacchini/LowResourcesFewShot-CLIP

Folders and files

Latest commit

History

Repository files navigation

CLIP on low-resource vision

Overview

Key Features

Adaptation Techniques

Experimental Datasets

Installation

Usage

Key Arguments

Citation

Authors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages