We provide the source code for the paper "StreamHover: Livestream Transcript Summarization and Annotation", accepted at EMNLP 2021. If you find the code useful, please cite the following paper.
@inproceedings{cho-et-al:2021,
Author = {Sangwoo Cho, Franck Dernoncourt, Tim Ganter, Trung Bui, Nedim Lipka, Walter Chang, Hailin Jin, Jonathan Brandt, Hassan Foroosh and Fei Liu},
Title = {StreamHover: Livestream Transcript Summarization and Annotation},
Booktitle = {Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
Year = {2021}}
We suggest the following environment:
- Anaconda
- Python (v3.6)
- Pytorch (v1.4)
- Pyrouge
- Huggingface Transformer
- Create the same environment with
conda env create --file environment.yml
- Download data from HERE
-
Unzip the file and move to
path/to/your/data/folder
-
Each pickle file (
Behance_train.pkl
,Behance_val.pkl
,Behance_test.pkl
) contains a list of the following data, which is based on a 5-min. transcript( List[dict], # transcript of 5 min. clip str, # abstractive summary List[int], # extractive summary (indices of each utterance, 0-based) int, # unique video ID int, # unique clip ID (e.g. 0 means 0-5 min. clip, 1 means 5-10 min. clip) str, # video title str, # video url str # transcript url )
-
Transcript dictionary above contains the following data
{ 'display': str, # utterance 'offset': float, # start time of the utterance 'duration': float # duration of the utterance }
-
In the paper, we used 3,884 clips for training, 728 clips for validation, and 809 clips for test. However, due to the privacy issue of two videos in the training set, we remove them and provide the following data.
- train: 3,860 clips from 318 videos (24 clips are removed from 2 videos)
- val: 728 clips from 25 videos
- test: 809 clips from 25 videos
-
- Trained model can be downloaded from HERE
- Download and move it to the folder
/models/c1024_e100
- codebook size: 1024, convolution filter size: 100
- Please refer to
src/commands.sh
for command examples.
- Download and move it to the folder
- Please refer to
src/commands.sh
for a command example. - A summary output file (
*.json
) will be generated inresults
folder.
- Please refer to
src/commands.sh
for a command example. - Summary utterances are selected from each valid 5 min. clip in a video, and each selected utterances are merged for a video-level summary.
- You can use the following arguments to generate one.
video_inference_id
: video ID for inference (refer tovideoID_split.csv
to obtain an index number for a video that you want to generate a video-level summary e.g. for row 9 in the file, index=7, video id=16, split=train, you need to set this argument with value of7
for one of videos in the training set)video_inf_min_sent
: summary generation is skipped if the number of utterances in any 5 min. clip is less than this valuenum_sum_sent
: number of summary utterances for each 5 min. clip
- A summary output file (
*.json
) and transcript file (*.json
) will be generated inresults
folder.