Skip to content

Latest commit

 

History

History
77 lines (72 loc) · 2.18 KB

README.md

File metadata and controls

77 lines (72 loc) · 2.18 KB

GVT: Generative Video-to-text Transformer

Image text

Prepare

  • install requirements:

    pip install git+https://github.com/microsoft/azfuse.git
    pip install -r requirements.txt

    Install PyTorch and torchvision following the official instructions, e.g.,

    conda install pytorch torchvision -c pytorch
  • prepare input data:

    • One txt file, each line is an absolute directory of a video's frames.
    • Or just an absolute path of a video file.
  • download examples and checkpoint:

    bash down.sh
    

Demo

Google Colab / [Jupyter notebooks]

Inference

see ./generativevideo2text/infer.sh

  • inference on single video
    # single video
    cd generativevideo2text
    TOKENIZERS_PARALLELISM=false CUDA_VISIBLE_DEVICES=<GPU_id> python infer.py \
        --config ../config/infer.yaml \
        --output_dir ../ckpt/results/ \
        --checkpoint ../GVT_ChinaOpen.pth \
        --min_length 15 \
        --beam_size 10 \
        --max_length 32 \
        --max_input_length 48 \
        --to_be_infered ../demo/videos/BV1CN411o7WE.mp4 \
        --use_video
    
    
  • inference on single frames dir
    # single images dir
      cd generativevideo2text
      TOKENIZERS_PARALLELISM=false CUDA_VISIBLE_DEVICES=<GPU_id> python infer.py \
          --config ../config/infer.yaml \
          --output_dir ../ckpt/results/ \
          --checkpoint ../GVT_ChinaOpen.pth \
          --min_length 15 \
          --beam_size 10 \
          --max_length 32 \
          --max_input_length 48 \
          --to_be_infered ../demo/frames/BV1CN411o7WE
  • inference on batch
    # batch
    cd generativevideo2text
    TOKENIZERS_PARALLELISM=false CUDA_VISIBLE_DEVICES=<GPU_id> python infer.py \
        --config ../config/infer.yaml \
        --output_dir ../demo/results/ \
        --checkpoint ../GVT_ChinaOpen.pth \
        --min_length 15 \
        --beam_size 10 \
        --max_length 32 \
        --max_input_length 48 \
        --test_root ../demo/frames \
        --to_be_infered ../demo/demo.txt
  • run infer.sh
    cd generativevideo2text
    bash infer.sh <GPU_id>