Skip to content

KM-BART: Knowledge Enhanced Multimodal BART for Visual Commonsense Generation

Notifications You must be signed in to change notification settings

fomalhautb/KM-BART

Repository files navigation

KM-BART: Knowledge Enhanced Multimodal BART for Visual Commonsense Generation (ACL 2021)

Yiran Xing*, Zai Shi*, Zhao Meng*, Gerhard Lakemeyer, Yunpu Ma, Roger Wattenhofer

∗The first three authors contribute equally to this work

[Paper] [Supplementary]

image

image

How to Cite Our Work

@inproceedings{KM-BART,
    title = "{KM}-{BART}: Knowledge Enhanced Multimodal {BART} for Visual Commonsense Generation",
    author = "Xing, Yiran  and
      Shi, Zai  and
      Meng, Zhao  and
      Lakemeyer, Gerhard  and
      Ma, Yunpu  and
      Wattenhofer, Roger",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
    year = "2021",
    publisher = "Association for Computational Linguistics",
    pages = "525--535"
}

Installation

  1. Clone the repository recursively

    git clone --recursive https://github.com/FomalhautB/KM-BART-ACL.git
    
  2. Create conda environment

    conda env create -f environment.yaml
    

The following steps are only required for feature extraction.

  1. Install bottom-up-attention.pytorch. Please refer to bottom-up-attention.pytorch, for more details.

    cd bottom-up-attention.pytorch
    # install detectron2
    cd detectron2
    pip install -e .
    cd ..
    # install the rest modules
    python setup.py build develop
    cd ..
  2. Install comet-commonsense. Please refer to comet-commonsense for more details.

    cd comet-commonsense
    # download data
    bash scripts/setup/get_atomic_data.sh
    bash scripts/setup/get_model_files.sh
    # install dependencies
    pip install tensorflow
    pip install ftfy==5.1
    conda install -c conda-forge spacy
    python -m spacy download en
    pip install tensorboardX
    pip install tqdm
    pip install pandas
    pip install ipython

Data Preparation

VCG

  1. Download the images from here and decompress the images into $VCR_DATASET
  2. Download the annotations from here and decompress the annotations into $VCG_ANNOTATION
  3. Extract features and save the features in $VCG_DATA:
    python -m scripts.prepare_vcg \
        --data_dir $VCR_DATASET \ 
        --output_dir $VCG_DATA \
        --annot_dir $VCG_ANNOTATION \
        --gpu_num 4

COCO

  1. Download the train images from here and decompress the images into $COCO_TRAIN
  2. Download the validation images from here and decompress the images into $COCO_VAL
  3. Download the annotations from here and decompress the annotations into $COCO_ANNOTATION
  4. Extract features and save the features in $COCO_DATA:
    python -m scripts.prepare_coco \
        --train_dir $COCO_TRAIN \
        --val_dir $COCO_VAL \
        --annot_dir $COCO_ANNOTATION  \
        --output_dir $COCO_DATA \
        --gpu_num 4

SBU and CC

  1. Download the json files for image urls and captions from here and Decompress the two files into $SBU_ANNOTATION
  2. extract the features, bounding box and labels, build image annotations and save into $OUTPUT_DATA (This will download the images first and save in $SBU_DATA):
    python -m scripts.prepare_sbu \
        --download \
        --data_dir $SBU_DATA \
        --output_dir $OUTPUT_DATA \
        --annot_dir $SBU_ANNOTATION \
        --gpu_num 4 \
        --n_jobs 8

VG

  1. Download the objects, relationships, region descriptions, attributs and image meta data from here and decompress them into $VG_ANNOTATION
  2. Download the images from the same link above and decompress them into $VG_IMAGES
    python -m scripts.prepare_vg \
        --annot_dir $VG_ANNOTATION \
        --output_dir $VG_DATA \
        --data_dir $VG_IMAGES \
        --gpu_num 4 \

Reasoning (SBU and COCO)

  1. Download the pretrained weight atomic_pretrained_model.pickle of COMET from comet-commonsense
    • Save it to $LOAD_PATH.
    • Follow the instructions in comet-commonsense to make the dataloader of COMET.
  2. Download the json files for image urls and captions from here and decompress the two files into $SBU_ANNOTATION.
  3. Download the SBU dataset and save the images in $SBU_DATA and decompress the features, bounding box and labels of images and save into $SBU_DATA.
  4. Generate inferences and save the inferences in $REASON_DATA.
    python -m scripts.prepare_sbu_reason \
         --output_dir $REASON_DATA \
         --annot_dir  $SBU_ANNOTATION \
         --model_file $LOAD_PATH/COMET \
         --gpu_num 2 \
         --sampling_algorithm topk-3
    
    # rename the output file
    mv $REASON_DATA/train.json $SBU_DATA/reason_train.json
  5. Filter the newly generated inferences with a KM-BART pretrained on VCG (also in $LOAD_PATH) and save the final results in $OUTPUT_DATA.
    python -m scripts.filter_reason  \
         --data_dir $SBU_DATA \
         --output_dir $OUTPUT_DATA \
         --checkpoint $LOAD_PATH/KM-BART

Training

Pretrain from scratch

  • Example of pretraining on COCO + SBU with 1 GPU and 4 CPUs from scratch (no pretrained weights)
    python pretrain \
        --dataset coco_train $COCO_DATA \
        --dataset coco_val $COCO_DATA \
        --dataset sbu_train $SBU_DATA \
        --checkpoint_dir $CHECKPOINT_DIR \
        --gpu_num 1 \
        --batch_size 32 \
        --master_port 12345 \
        --log_dir $LOG_DIR \
        --amp \
        --num_workers 4 \
        --model_config config/pretrain_base.json

Pretrain from facebook bart-base

  • Example of loading pretrained weights from facebook bart base and train on COCO
    python pretrain \
        --dataset coco_train $COCO_DATA \
        --checkpoint_dir $CHECKPOINT_DIR \
        --model_config config/pretrain_base.json \
        --checkpoint facebook/bart-base

Continue pretraining

  • Example of loading pretrained weights from previous checkpoint and continue to train on COCO
    python pretrain \
        --dataset coco_train $COCO_DATA \
        --checkpoint_dir $CHECKPOINT_DIR \
        --model_config config/pretrain_base.json \
        --checkpoint $CHECKPOINT \
        --continue_training

Train VCG

  • Example of loading weights from pretrained checkpoint and fine tune on VCG. Validation will of loss and score will be done at the end of each epoch
    python vcg_train \
        --data_dir $VCG_DATA \
        --checkpoint_dir $CHECKPOINT_DIR \
        --validate_loss \
        --validate_score \
        --model_config config/vcg_base.json \
        --checkpoint $CHECKPOINT \

Generate and evaluate VCG

  • Example of generating sentences for VCG:

    python vcg_generate \
        --data_dir $VCG_DATA \
        --checkpoint $CHECKPOINT \
        --output_file $GENERATED_FILE \
  • Example of evaluating the generated file for VCG validation set:

    python vcg_eval \
        --generation $GENERATED_FILE \
        --reference $VCG_DATA/val_ref.json

Pretrained Weights

About

KM-BART: Knowledge Enhanced Multimodal BART for Visual Commonsense Generation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages