GVT: Generative Video-to-text Transformer

Prepare

install requirements:

pip install git+https://github.com/microsoft/azfuse.git
pip install -r requirements.txt

Install PyTorch and torchvision following the official instructions, e.g.,

conda install pytorch torchvision -c pytorch

prepare input data:
- One txt file, each line is an absolute directory of a video's frames.
- Or just an absolute path of a video file.
download examples and checkpoint:
```
bash down.sh
```

Demo

Google Colab / [Jupyter notebooks]

Inference

see ./generativevideo2text/infer.sh

inference on single video

# single video
cd generativevideo2text
TOKENIZERS_PARALLELISM=false CUDA_VISIBLE_DEVICES=<GPU_id> python infer.py \
    --config ../config/infer.yaml \
    --output_dir ../ckpt/results/ \
    --checkpoint ../GVT_ChinaOpen.pth \
    --min_length 15 \
    --beam_size 10 \
    --max_length 32 \
    --max_input_length 48 \
    --to_be_infered ../demo/videos/BV1CN411o7WE.mp4 \
    --use_video

inference on single frames dir

# single images dir
  cd generativevideo2text
  TOKENIZERS_PARALLELISM=false CUDA_VISIBLE_DEVICES=<GPU_id> python infer.py \
      --config ../config/infer.yaml \
      --output_dir ../ckpt/results/ \
      --checkpoint ../GVT_ChinaOpen.pth \
      --min_length 15 \
      --beam_size 10 \
      --max_length 32 \
      --max_input_length 48 \
      --to_be_infered ../demo/frames/BV1CN411o7WE

inference on batch

# batch
cd generativevideo2text
TOKENIZERS_PARALLELISM=false CUDA_VISIBLE_DEVICES=<GPU_id> python infer.py \
    --config ../config/infer.yaml \
    --output_dir ../demo/results/ \
    --checkpoint ../GVT_ChinaOpen.pth \
    --min_length 15 \
    --beam_size 10 \
    --max_length 32 \
    --max_input_length 48 \
    --test_root ../demo/frames \
    --to_be_infered ../demo/demo.txt

run infer.sh

cd generativevideo2text
bash infer.sh <GPU_id>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

GVT: Generative Video-to-text Transformer

Prepare

Demo

Inference

Files

README.md

Latest commit

History

README.md

File metadata and controls

GVT: Generative Video-to-text Transformer

Prepare

Demo

Inference