VioletV2

Techincal report coming soon!

Spoilers: The model is similar to the original paper but replaces the cumbersome detection network with a CLIP vision encoder (which can be trained end-to-end without relying on an external model), and utilizes adapters on the decoder side

Data

Coco Images HDF5 file: Download

Annotations: Download

Environment setup

Clone the repository and create the Violet conda environmnet

conda env create -f violet.yml

make logs and saved_models directories

mkdir logs
mkdir saved_models

Checkpoint

Early checkpoint: Download

Train the model (refactored code)

simpler and more friendly impelementation (You can ignore the data and evaluation folders when using this)

python train_refactored.py --batch_size 60 --head 12 --tau 0.3 --images_path coco_images.h5  --annotation_folder annotations --lr 1e-4 --random_seed 42 --log_file logs/log --decoder_layer 12 --optimizer_type adamw  --gradient_accumulation_steps 1  --exp_name violet

Train the model (legacy code)

based on the code used in Meshed transformer and VisualGPT, edited to use python 3 instead of the original 2.7

python train_legacy.py --batch_size 40 --head 12 --tau 0.3 --features_path ./coco_images.h5  --annotation_folder annotations --lr 1e-4 --random_seed 42 --log_file logs/log --decoder_layer 12 --optimizer_type adamw  --gradient_accumulation_steps 1  --exp_name violet

Acknowledgement

This code used resources from Meshed Memory Transformer, Transformers and VisualGPT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

VioletV2

Techincal report coming soon!

Data

Environment setup

Checkpoint

Train the model (refactored code)

simpler and more friendly impelementation (You can ignore the data and evaluation folders when using this)

Train the model (legacy code)

based on the code used in Meshed transformer and VisualGPT, edited to use python 3 instead of the original 2.7

Acknowledgement

Files

README.md

Latest commit

History

README.md

File metadata and controls

VioletV2

Techincal report coming soon!

Data

Environment setup

Checkpoint

Train the model (refactored code)

simpler and more friendly impelementation (You can ignore the data and evaluation folders when using this)

Train the model (legacy code)

based on the code used in Meshed transformer and VisualGPT, edited to use python 3 instead of the original 2.7

Acknowledgement