Skip to content

mainaksingha01/C-SAW

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

C-SAW: Self-Supervised Prompt Learning for Image Generalization in Remote Sensing

Official repository of C-SAW, a vision-language models (VLM) for unknown class and domain generalization using self-supervised learning in Remote Sensing.

ICVGIP 2023 (Best Paper Award)

Conference arXiv

Abstract

teaser

We focus on domain and class generalization problems in analyzing optical remote sensing images, using the large-scale pre-trained vision-language model (VLM), CLIP. While contrastively trained VLMs show impressive zero-shot generalization performance, their effectiveness is limited when dealing with diverse domains during training and testing. Existing prompt learning techniques overlook the importance of incorporating domain and content information into the prompts, which results in a drop in performance while dealing with such multi-domain data. To address these challenges, we propose a solution that ensures domain-invariant prompt learning while enhancing the expressiveness of visual features. We observe that CLIP’s vision encoder struggles to identify contextual image information, particularly when image patches are jumbled up. This issue is especially severe in optical remote sensing images, where land-cover classes exhibit well-defined contextual appearances. To this end, we introduce C-SAW, a method that complements CLIP with a self-supervised loss in the visual space and a novel prompt learning technique that emphasizes both visual domain and contentspecific features. We keep the CLIP backbone frozen and introduce a small set of projectors for both the CLIP encoders to train C-SAW contrastively. Experimental results demonstrate the superiority of C-SAW across multiple remote sensing benchmarks and different generalization tasks.

Architecture

architecture

C-SAW utilizes CLIP’s frozen visual and text encoder backbones. The visual attentive token generator (GVAT) generates M visual attentive tokens using intermediate layers (IL) of the source domains S. These visual attentive tokens, along with context and class tokens, create text embeddings, forming the visual attentive text prompting (VATP) approach.

Datasets

Released Datasets (Version-2):

  • For Domain Generalization:

Code

  • files folder contains the dataloader files of each datasets.
  • models folder contains the code of our model.
  • Clone this repository Dassl inside this repo for the metrices.
  • scripts folder holds the scripts of each of the generalization tasks both for training and testing.
$ cd scripts
$ bash base2new_train.sh patternnet 1
$ bash base2new_test.sh patternnet 1
$ bash crossdataset_train.sh patternnet 1
$ bash crossdataset_test.sh rsicd 1
$ bash domaingen_train.sh patternnetv2 1
$ bash domaingen_test.sh rsicdv2 1

Results

Base-to-New Class Generalization

base2new

Cross Dataset Generalization

crossdataset

Domain Generalization

domaingen

Bibtex

Please cite if you use our works. Thanks.

@inproceedings{bhattacharya2023c,
  title={C-SAW: Self-Supervised Prompt Learning for Image Generalization in Remote Sensing},
  author={Bhattacharya, Avigyan and Singha, Mainak and Jha, Ankit and Banerjee, Biplab},
  booktitle={Proceedings of the Fourteenth Indian Conference on Computer Vision, Graphics and Image Processing},
  pages={1--10},
  year={2023}
}

@inproceedings{singha2023applenet,
  title={Applenet: Visual attention parameterized prompt learning for few-shot remote sensing image generalization using clip},
  author={Singha, Mainak and Jha, Ankit and Solanki, Bhupendra and Bose, Shirsha and Banerjee, Biplab},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2023}
}

Acknowledgements

Our code is mainly based on CoOp and APPLeNet.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published