Jiezhang Cao, Jingyun Liang, Kai Zhang, Wenguan Wang, Qin Wang, Yulun Zhang, Hao Tang, Luc Van Gool
Computer Vision Lab, ETH Zurich.
arxiv | supplementary | pretrained models | visual results
This repository is the official PyTorch implementation of "Towards Interpretable Video Super-Resolution via Alternating Optimization" (arxiv, supp, pretrained models, visual results). DAVSR achieves state-of-the-art performance in practical time-space video super-resolution.
In this paper, we study a practical space-time video super-resolution (STVSR) problem which aims at generating a high-framerate high-resolution sharp video from a low-framerate low-resolution blurry video. Such problem often occurs when recording a fast dynamic event with a low-framerate and low-resolution camera, and the captured video would suffer from three typical issues: i) motion blur occurs due to object/camera motions during exposure time; ii) motion aliasing is unavoidable when the event temporal frequency exceeds the Nyquist limit of temporal sampling; iii) high-frequency details are lost because of the low spatial sampling rate. These issues can be alleviated by a cascade of three separate sub-tasks, including video deblurring, frame interpolation, and super-resolution, which, however, would fail to capture the spatial and temporal correlations among video sequences. To address this, we propose an interpretable STVSR framework by leveraging both model-based and learning-based methods. Specifically, we formulate STVSR as a joint video deblurring, frame interpolation, and super-resolution problem, and solve it as two sub-problems in an alternate way. For the first sub-problem, we derive an interpretable analytical solution and use it as a Fourier data transform layer. Then, we propose a recurrent video enhancement layer for the second sub-problem to further recover high-frequency details. Extensive experiments demonstrate the superiority of our method in terms of quantitative metrics and visual quality.
- Add pretrained model
- Add results of test set
- Python 3.8, PyTorch >= 1.9.1
- mmedit 0.11.0
- Requirements: see requirements.txt
- Platforms: Ubuntu 18.04, cuda-11.1
Following commands will download pretrained models and test datasets. If out-of-memory, try to reduce num_frame_testing
and size_patch_testing
at the expense of slightly decreased performance.
# download code
git clone https://github.com/caojiezhang/DAVSR
cd DAVSR
pip install -r requirements.txt
PYTHONPATH=/bin/..:tools/..: python tools/test.py configs/restorers/uvsrnet/002_pretrain_uvsr3DBDnet_REDS_25frames_3iter_sf544_slomo_modify_newdataset.py work_dirs/002_pretrain_uvsr3DBDnet_REDS_25frames_3iter_sf544_slomo_modify_newdataset/latest.pth
All visual results of DAVSR can be downloaded here.
The training and testing sets are as follows (see the supplementary for a detailed introduction of all datasets). For better I/O speed, use create_lmdb.py to convert .png
datasets to .lmdb
datasets.
Note: You do NOT need to prepare the datasets if you just want to test the model. main_test_vrt.py
will download the testing set automaticaly.
Task | Training Set | Testing Set | Pretrained Model and Visual Results of DAVSR |
---|---|---|---|
Real video denoising | REDS sharp (266 videos, 266000 frames: train + val except REDS4) *Use regroup_reds_dataset.py to regroup and rename REDS val set |
REDS4 (4 videos, 2000 frames: 000, 011, 015, 020 of REDS) | here |
PYTHONPATH=/bin/..:tools/..: ./tools/dist_train.sh configs/restorers/uvsrnet/002_pretrain_uvsr3DBDnet_REDS_25frames_3iter_sf544_slomo_modify_newdataset.py 8
We achieved state-of-the-art performance on practical space-time video super-resolution. Detailed results can be found in the paper.
@inproceedings{cao2022davsr,
title={Towards Interpretable Video Super-Resolution via Alternating Optimization},
author={Cao, Jiezhang and Liang, Jingyun and Zhang, Kai and Wang, Wenguan and Wang, Qin and Zhang, Yulun and Tang, Hao and Van Gool, Luc},
booktitle={European conference on computer vision},
year={2022}
}
This project is released under the CC-BY-NC license. We refer to codes from KAIR, BasicSR, and mmediting. Thanks for their awesome works. The majority of DAVSR is licensed under CC-BY-NC, however portions of the project are available under separate license terms: KAIR is licensed under the MIT License, BasicSR, Video Swin Transformer and mmediting are licensed under the Apache 2.0 license.