StreamHover: Livestream Transcript Summarization and Annotation

We provide the source code for the paper "StreamHover: Livestream Transcript Summarization and Annotation", accepted at EMNLP 2021. If you find the code useful, please cite the following paper.

@inproceedings{cho-et-al:2021,
 Author = {Sangwoo Cho, Franck Dernoncourt, Tim Ganter, Trung Bui, Nedim Lipka, Walter Chang, Hailin Jin, Jonathan Brandt, Hassan Foroosh and Fei Liu},
 Title = {StreamHover: Livestream Transcript Summarization and Annotation},
 Booktitle = {Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
 Year = {2021}}

Dependencies

We suggest the following environment:

Anaconda
Python (v3.6)
Pytorch (v1.4)
Pyrouge
Huggingface Transformer
Create the same environment with conda env create --file environment.yml

Behance Dataset

Download data from HERE
- Unzip the file and move to path/to/your/data/folder
- Each pickle file (Behance_train.pkl, Behance_val.pkl, Behance_test.pkl) contains a list of the following data, which is based on a 5-min. transcript
```
(
	List[dict],    # transcript of 5 min. clip
	str,           # abstractive summary
	List[int],     # extractive summary (indices of each utterance, 0-based)
	int,           # unique video ID
	int,           # unique clip ID (e.g. 0 means 0-5 min. clip, 1 means 5-10 min. clip)
	str,           # video title
	str,           # video url
	str            # transcript url
)
```
- Transcript dictionary above contains the following data
```
{
	'display': str,      # utterance
	'offset': float,     # start time of the utterance
	'duration': float    # duration of the utterance
}
```
- In the paper, we used 3,884 clips for training, 728 clips for validation, and 809 clips for test. However, due to the privacy issue of two videos in the training set, we remove them and provide the following data.
  - train: 3,860 clips from 318 videos (24 clips are removed from 2 videos)
  - val: 728 clips from 25 videos
  - test: 809 clips from 25 videos

Train / Test Models

Trained model can be downloaded from HERE
- Download and move it to the folder /models/c1024_e100
  - codebook size: 1024, convolution filter size: 100
- Please refer to src/commands.sh for command examples.

Generate a Clip-Level Summary

Please refer to src/commands.sh for a command example.
A summary output file (*.json) will be generated in results folder.

Generate a Video-Level Summary

Please refer to src/commands.sh for a command example.
Summary utterances are selected from each valid 5 min. clip in a video, and each selected utterances are merged for a video-level summary.
You can use the following arguments to generate one.
- video_inference_id: video ID for inference (refer to videoID_split.csv to obtain an index number for a video that you want to generate a video-level summary e.g. for row 9 in the file, index=7, video id=16, split=train, you need to set this argument with value of 7 for one of videos in the training set)
- video_inf_min_sent: summary generation is skipped if the number of utterances in any 5 min. clip is less than this value
- num_sum_sent: number of summary utterances for each 5 min. clip
A summary output file (*.json) and transcript file (*.json) will be generated in results folder.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src		src
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StreamHover: Livestream Transcript Summarization and Annotation

Dependencies

Behance Dataset

Train / Test Models

Generate a Clip-Level Summary

Generate a Video-Level Summary

About

Releases

Packages

Languages

ucfnlp/streamhover

Folders and files

Latest commit

History

Repository files navigation

StreamHover: Livestream Transcript Summarization and Annotation

Dependencies

Behance Dataset

Train / Test Models

Generate a Clip-Level Summary

Generate a Video-Level Summary

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages