CVMSI

This is the Pytorch code for our paper "Cross-view and Multi-step Interaction for Change Captioning" (under review).

Installation

Clone this repository
cd CVMSI
Make virtual environment with Python 3.10.14
Install requirements (pip install -r requirements.txt)
Setup COCO caption eval tools (github)
An NVIDA 4090 GPU or others.

Data

Download data from Baidu drive link.
Download clevr-change dataset from RobustChangeCaptioning.
Extract visual features using ImageNet pretrained ResNet-101:

# processing default images
python scripts/extract_features.py --input_image_dir ./data/images --output_dir ./data/features --batch_size 128

# processing semantically changes images
python scripts/extract_features.py --input_image_dir ./data/sc_images --output_dir ./data/sc_features --batch_size 128

# processing distractor images
python scripts/extract_features.py --input_image_dir ./data/nsc_images --output_dir ./data/nsc_features --batch_size 128

Testing/Inference

We provide pre-trained weights, download it from Baidu drive link.

python test_trans_c.py --cfg configs/transformer-c.yaml  --snapshot 25000 --gpu 0

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
configs		configs
data		data
datasets		datasets
models		models
scripts		scripts
utils		utils
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
test_trans_c.py		test_trans_c.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CVMSI

Installation

Data

Testing/Inference

About

Releases

Packages

Languages

License

TTXiann/CVMSI

Folders and files

Latest commit

History

Repository files navigation

CVMSI

Installation

Data

Testing/Inference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages