Official evaluation toolkit of VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis
Create a new conda env, and Install the necessary dependencies:
git clone https://github.com/fitzpchao/RSEvalKit
cd RSEvalKit
conda create -n rseval
conda activate rseval
pip install -r requirements.txt
- Please refer to the evaluation data description and download the vhm_eval dataset.
- Prepare the datasets following the file structure below:
{dataset_base}/
# image dirs
abspos_c1f4_dota-test_mc/
image0.jpg
image1.jpg
...
abspos_dota-test_mc/
...
# json files
abspos_c1f4_dota-test_mc.json
abspos_dota-test_mc.json
...
Please refer to this guide to download the corresponding VHM model weights.
$ CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port 52302 ./model_eval_mp.py --task all --batch-per-gpu 1 --dataset-base ${dataset_base} --save-path ${your_save_path}
If you want to evaluate our model on multiple GPUs, you can tweak the arguments --nproc_per_node
and --batch-per-gpu
, then make sure that the value of these arguments follow the equation:
${nproc_per_node} = ${batch-per-gpu} × ${the number of your GPUs}
For example, to perform an evaluation on 4 GPUs, each of which has a batchsize of 3, you should run:
$ CUDA_VISIBLE_DEVICES="0,1,2,3" torchrun --nproc_per_node=12 --master_port 52302 ./model_eval_mp.py --task all --batch-per-gpu 3 --dataset-base ${dataset_base} --save-path ${your_save_path}
Please refer to our paper for more technical details:
If this code is helpful to your research, please consider citing our paper by:
@misc{pang2024vhmversatilehonestvision,
title={VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis},
author={Chao Pang and Xingxing Weng and Jiang Wu and Jiayu Li and Yi Liu and Jiaxing Sun and Weijia Li and Shuai Wang and Litong Feng and Gui-Song Xia and Conghui He},
year={2024},
eprint={2403.20213},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2403.20213},
}
We gratefully acknowledge the VLMEvalKit works.