🎬⏮️ "Previously on ..." From Recaps to Story Summarization
S08E23_recap.mp4
S08E23_recap.mp4
- About
- Setting up the repository
- Feature Extraction
- Downloading and Setting up the data directories
- Train TaleSumm with different configurations
- Inference on TaleSumm to create summaries
- License
- Bibtex
This is the official code repository for CVPR-2024 accepted paper "Previously on ..." From Recaps to Story Summarization. This repository contains the implementation of TaleSumm, a Transformer-based hierarchical model on our proposed dataset PlotSnap. TaleSumm processes entire episodes by creating compact shot 🎞️ and dialog 🗣️ representations, and predicts importance scores for each video shot and dialog utterance by enabling interactions between local story groups. Our model leverages multiple modalities, including visual and dialog features, to capture a comprehensive understanding of important shots in complex movie environments. Additionally, we provide the pre-trained weights for the TaleSumm as well as all the pre-trained feature backbones used in feature extraction. On top of that, we provide pre-extracted features for episodes (per-frame embeddings using DenseNet169
, CLIP
, and MViT
), and dialog features (with finetuned RoBERTa
backbone).
-
Clone the repository and change the working directory to be project's root.
$ git clone https://github.com/katha-ai/RecapStorySumm-CVPR2024 $ cd RecapStorySumm-CVPR2024
-
This project strictly requires
python==3.8
.Create a virtual environment using Conda.
$ conda create -n storysumm python=3.8 $ conda activate storysumm (storysumm) $ pip install -r requirements.txt
OR
Create a virtual environment using pip (make sure you have Python3.8 installed)
$ python3.8 -m pip install virtualenv $ python3.8 -m virtualenv storysumm $ source storysumm/bin/activate (storysumm) $ pip install -r requirements.txt
🛠️Configure the configs/base.yaml
file
-
Add the absolute paths to the project directory in
configs/base.yaml
-
E.g., If you have cloned the repository at
/home/user/RecapStorySumm-CVPR2024
, and want to download model checkpoints and the data features, then the path variables inconfigs/base.yaml
would be-root: "/home/user/RecapStorySumm-CVPR2024" # Save PlotSnap data features here data_path: "${root}/data" split_dir: "${root}/configs/data_configs/splits" # To save dialog (and vision) backbones cache_dir: "${root}/cache/" # use the following for model checkpoints ckpt_path: "${root}/checkpoints/storysumm"
Refer to configs/trainer_config.yaml
and configs/inference_config.yaml
for the default parameter configuration while training and inferencing, respectively.
Follow the instructions in feature_extractors/README.md [WIP] to extract required features from any given video and prepare it summarization.
Note that we have already provided the pre-extracted features for PlotSnap below.
You can also use wget
to download these files-
# Download the features (as mentioned below into data/ folder)
LINK="https://iiitaphyd-my.sharepoint.com/:u:/g/personal/makarand_tapaswi_iiit_ac_in/EdEsWTvAEg5Iuo1cAUNmVq4Bipauv5nGdTdXAtMidWR5GA?e=dLWkNo"
wget -O data $LINK
File name | Contents | Comments |
---|---|---|
24 |
|
Contains S02 to S09 directories which will occupy 92GB of disk space. |
Prison Break |
|
This occupy 22GB of disk space. |
# Create the checkpoints folder `checkpoints/storysumm` in the project's root folder if not present already and put all checkpoints one-by-one in them.
mkdir -p <absolute_path_to_root>/checkpoints/storysumm
# OR (simply do the following).
# Now download the pre-trained weights (as mentioned below into ckpts/ folder)
LINK="https://iiitaphyd-my.sharepoint.com/:u:/g/personal/makarand_tapaswi_iiit_ac_in/ES91ZF90ArJGiXkEa53-kJABNytKOyOSQlr03dnTf6bKKg?e=PN1Gir"
wget -O checkpoints $LINK
File name | Comments | Training command |
---|---|---|
TaleSumm-IntraCVT|S[1,2,3,4,5] |
IntraCVT split i=0,1,2,3,4 checkpoint of TaleSumm |
(storysumm) $ python -m trainer split_type='intra-loocv' |
TaleSumm-Final |
Final checkpoint of TaleSumm to be used in production | (storysumm) $ python -m trainer split_type='final-split.yaml' |
After completing the above, now you can train Talesumm on a 12GB Nvidia-2080 RTX-Ti GPU! You can also use the pre-trained weights provided in the Download section.
Note: It is recommended to use wandb to log & track your experiments
Using the default values given in the config_base.yaml
-
To train TaleSumm for PlotSnap, use the default config (no argument required)
(storysumm) $ python -m trainer
-
To train Talesumm with a specific modality (valid keywords-
vid
,dia
,both
)(storysumm) $ python -m trainer modality=both
-
To train Talesumm on a specific series (valid keywords-
24
,prison-break
,all
)(storysumm) $ python -m trainer series='24'
-
To change the split type to be used for training (valid keywords-
cross-series
,intra-loocv
,inter-loocv
,default-split.yaml
,fandom-split.yaml
)(storysumm) $ python -m trainer split_type=cross-series
-
To choose which visual features to train on, create a list of the features to be used (valid keywords-
imagenet
,mvit
,clip
)(storysumm) $ python -m trainer visual_features=['imagenet','mvit','clip']
-
To choose the fusion style of the visual features (valid keywords-
concat
,stack
,mul
)(storysumm) $ python -m trainer feat_fusion_style=concat
-
To choose the type of attention in the model (valid keywords-
sparse
,full
)(storysumm) $ python -m trainer attention_type=sparse
-
To disable Group tokens from the model
(storysumm) $ python -m trainer withGROUP=False
NOTE : If
withGROUP
is True thencomputeGROUPloss
needs to be True as well -
To enable wandb logging (recommended)
(storysumm) $ python -m trainer wandb.logging=True
NOTE : We have used 4 GPUs while training that is why the
gpus
parameter in the configuration is set to [0,1,2,3]. If you plan to more or less GPUs, please enter their GPU id's accordingly
To summarise a new video using Talesumm, please follow the following commands
(storysummm) $ python -m inference <overrides for inference_config.yaml>
NOTE : We have used 4 GPUs while training that is why the
gpus
parameter in the configuration is set to [0,1,2,3]. If you plan to more or less GPUs, please enter their GPU id's accordingly
This code is available for non-commercial scientific research purposes as defined in the LICENSE file. By downloading and using this code you agree to the terms in the LICENSE. Third-party datasets and software are subject to their respective licenses.
If you find any part of this repository useful, please cite the following paper!
@inproceedings{singh2024previously,
title={{"Previously on ..." From Recaps to Story Summarization}},
author={Aditya Kumar Singh and Dhruv Srivastava and Makarand Tapaswi},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2024},
}