Skip to content
/ D-NeRV Public

The official implementation of 'Towards Scalable Neural Representation for Diverse Videos' (CVPR 2023)

License

Notifications You must be signed in to change notification settings

boheumd/D-NeRV

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Towards Scalable Neural Representation for Diverse Videos (CVPR2023)

The official repository of our paper "Towards Scalable Neural Representation for Diverse Videos".

teaser

Model Overview

model

Requirements

You can install the conda environment by running:

conda create -n dnerv python=3.9.7
conda activate dnerv
conda install pytorch torchvision pytorch-cuda=11.7 -c pytorch -c nvidia
pip install tensorboard
pip install tqdm dahuffman pytorch_msssim

Video Compression

We adopt the existing deep image compression models provided by CompressAI. We provide the pre-extracted ground-truth video frames and pre-compressed keyframes for UVG and UCF101 datasets in this google drive link.

Unzip it under the data/ folder and make sure the data structure is as below.

 ├── data
     └── UVG
         ├── gt
         ├── keyframe
         ├── annotation
     └── UCF101
         ├── gt
         ├── keyframe
         ├── annotation

Please note that, we split the 1024x1920 UVG videos into non-overlap 256x320 frame patches during training due to the GPU memory limitation.

Running

Training

We train our model on 4 RTX-A6000 GPUs. To compare with other state-of-the-art video compression methods, we run for 1600 epochs on UVG dataset and 800 epochs on UCF101 dataset. You can change to a smaller number of epochs to reduce the training time.

# UVG datset
python train.py --dataset UVG --model_type ${model_type} --model_size ${model_size} \
    -e 1600 -b 32 --lr 5e-4 --loss_type Fusion6 -d

# UCF101 datset
python train.py --dataset UCF101 --model_type ${model_type} --model_size ${model_size} \
    -e 800  -b 32 --lr 5e-4 --loss_type Fusion19 -d

Testing

# Evaluate model without model quantization
python train.py --dataset UVG --model_type D-NeRV --model_size M \
        --eval_only --model saved_model/UVG/D-NeRV_M.pth

# Evaluate model with model quantization
python train.py --dataset UVG --model_type D-NeRV --model_size M \
        --eval_only --model saved_model/UVG/D-NeRV_M.pth --quant_model

Dump Predicted Frames

python train.py --dataset UVG --model_type D-NeRV --model_size M \
        --eval_only --model saved_model/UVG/D-NeRV_M.pth --quant_model \
        --dump_images

Please note that, for the UVG dataset, after we splitting 1024x1920 videos into 256x320 frame patches, the PSNR/MS-SSIM results will be different from the actual PSNR/MS-SSIM of 1024x1920. Therefore, we need to dump the predicted frame patches first, and then re-evaluate the PSNR/MS-SSIM with the ground-truth 1024x1980 video frames.

PSNR/MS-SSIM vs. BPP Ratio Calculation

UVG Dataset

Results for different model configs are shown in the following table. The PSNR/MS-SSIM results are reported from the model with quantization.

Model Arch Model Param(M) Entropy Encoding Keyframe Size(Mb) Total(Mb) BPP PNSR MS-SSIM Link
D-NeRV XS 8.02 0.883 88.39 145.0 0.0189 34.11 0.9479 link
D-NeRV S 15.96 0.881 88.39 200.9 0.0262 34.76 0.9540 link
D-NeRV M 24.20 0.880 123.2 293.6 0.0383 35.74 0.9604 link
D-NeRV L 41.66 0.877 175.1 467.3 0.0609 36.78 0.9668 link
D-NeRV XL 69.75 0.875 254.7 730.3 0.0952 37.43 0.9719 link

UCF101 Dataset (training split)

Model Arch Model Param(M) Entropy Encoding Keyframe Size(Mb) Total(Mb) BPP PNSR MS-SSIM Link
D-NeRV S 21.40 0.882 481.6 632.7 0.0559 28.11 0.9153 link
D-NeRV M 38.90 0.891 481.6 758.7 0.0671 29.15 0.9364 link
D-NeRV L 61.30 0.891 481.6 918.3 0.0812 29.97 0.9501 link
NeRV S 88.00 0.903 635.9 0.0562 26.78 0.9094 link
NeRV M 105.3 0.900 758.4 0.0671 27.06 0.9177 link
NeRV L 127.2 0.903 919.1 0.0813 27.61 0.9284 link

BPP Calculation

$BPP=\dfrac{\overbrace{\text{Model Param} * 8}^{\text{int8 quantization}} * \text{Entropy Encoding} + \text{Keyframe Size}}{\text{H} * \text{W} * \text{Num Frames}}$

For UVG dataset, H = 1024, W = 1920, Num Frames = 3900.

For UCF101 dataset, training split, H = 256, W = 320, Num Frames = 138041.

Citation

If you find our code or our paper useful for your research, please [★star] this repo and [cite] the following paper:

@inproceedings{he2023dnerv,
  title = {Towards Scalable Neural Representation for Diverse Videos},
  author = {He, Bo and Yang, Xitong and Wang, Hanyu and Wu, Zuxuan and Chen, Hao and Huang, Shuaiyi and Ren, Yixuan and Lim, Ser-Nam and Shrivastava, Abhinav},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2023},
}

About

The official implementation of 'Towards Scalable Neural Representation for Diverse Videos' (CVPR 2023)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages