Skip to content
/ DARA Public

[ICME 2024 Oral] DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding

Notifications You must be signed in to change notification settings

liuting20/DARA

Repository files navigation

DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding

Ting Liu, Xuyang Liu, Siteng Huang, Honggang Chen, Quanjun Yin, Long Qin, Donglin Wang, Yue Hu

✨ Overview

In this paper, we explore applying parameter-efficient transfer learning (PETL) to efficiently transfer the pre-trained vision-language knowledge to VG. Specifically, we propose DARA, a novel PETL method comprising Domain-aware Adapters (DA Adapters) and Relation-aware Adapters (RA Adapters) for VG. DA Adapters first transfer intra-modality representations to be more fine-grained for the VG domain. Then RA Adapters share weights to bridge the relation between two modalities, improving spatial reasoning. Empirical results on widely-used benchmarks demonstrate that DARA achieves the best accuracy while saving numerous updated parameters compared to the full fine-tuning and other PETL methods. Notably, with only 2.13% tunable backbone parameters, DARA improves average accuracy by 0.81% across the three benchmarks compared to the baseline model. Note that the tunale parameters are lower than reported in the paper by optimization.

👉 Installation

  1. Clone this repository.

    git clone https://github.com/liuting20/DARA.git
    
  2. Prepare for the running environment.

     conda env create -f environment.yaml      pip install -r requirements.txt
    

👉 Getting Started

Please refer to GETTING_STARGTED.md to learn how to prepare the datasets and pretrained checkpoints.

👉 Training and Evaluation

  1. Training

    CUDA_VISIBLE_DEVICES=0 python -u train.py --batch_size 64 --lr_bert 0.00001 --aug_crop --aug_scale --aug_translate --backbone resnet50 --detr_model ./checkpoints/detr-r50-referit.pth --bert_enc_num 12 --detr_enc_num 6 --dataset unc --max_query_len 20 --output_dir outputs/referit_r50 --epochs 90 --lr_drop 60
    

    We recommend to set --max_query_len 40 for RefCOCOg, and --max_query_len 20 for other datasets.

    We recommend to set --epochs 180 (--lr_drop 120 acoordingly) for RefCOCO+, and --epochs 90 (--lr_drop 60 acoordingly) for other datasets.

  2. Evaluation

    CUDA_VISIBLE_DEVICES=0 python -u eval.py --batch_size 64 --num_workers 4 --bert_enc_num 12 --detr_enc_num 6 --backbone resnet50 --dataset unc --max_query_len 20 --eval_set testA --eval_model ./outputs/referit_r50/best_checkpoint.pth --output_dir ./outputs/referit_r50
    

👍 Acknowledge

This codebase is partially based on TransVG.

📌 Citation

Please consider citing our paper in your publications, if our findings help your research.

@article{liu2024dara,
  title={DARA: Domain-and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding},
  author={Liu, Ting and Liu, Xuyang and Huang, Siteng and Chen, Honggang and Yin, Quanjun and Qin, Long and Wang, Donglin and Hu, Yue},
  journal={arXiv preprint arXiv:2405.06217},
  year={2024}
}

📧 Contact

For any question about our paper or code, please contact Ting Liu or Xuyang Liu.

About

[ICME 2024 Oral] DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published