Blind Video Deflickering by Neural Filtering with a Flawed Atlas
Chenyang Lei*, Xuanchi Ren*, Zhaoxiang Zhang and Qifeng Chen
CVPR 2023
* indicates equal contribution
[Paper] [ArXiv] [Project Website] [Dataset-1] [Dataset-2]
- May 1, 2023: Our collected dataset with real-world flickering videos is released.
- Apr 10, 2023: Our code can work with segmentations masks for a foreground object.
- Mar 12, 2023: Inference code and paper are released! Collected dataset will release soon.
- Feb 28, 2023: Our paper is accepted by CVPR 2023, code will be released in two weeks.
- Environment & Dependency
- Inference
- All Evaluated Types of Flickering Videos
- Advanced Features
- Suggestions for Choosing the Hyperparameters
- Collected Real-world Dataset
- Discussion and Related work
We provide an environment with python 3.10
& torch 1.12
with CUDA 11. If you want a torch 1.6
with CUDA 10, please check this env file.
Install environment:
conda env create -f environment.yml
conda activate deflicker
Download pretrained ckpt:
git clone https://github.com/ChenyangLEI/cvpr2023_deflicker_public_folder
mv cvpr2023_deflicker_public_folder/pretrained_weights ./ && rm -r cvpr2023_deflicker_public_folder
Put your video or image folder under data/test
. For example:
export PYTHONPATH=$PWD
python test.py --video_name data/test/Winter_Scenes_in_Holland.mp4 # for video input
python test.py --video_frame_folder data/test/Winter_Scenes_in_Holland # for image folder input
Find the results under results/$YOUR_DATA_NAME/final/output.mp4
.
Note: our inference code only takes about 3000M
GPU memory.
-
Synthesized videos from text-to-video algorithms
-
Old movies
-
Old cartoons
-
Time-lapse videos
-
Slow-motion videos
-
Processed videos by the following image processing algorithms:
Currently, we support to process video with Carvekit or Mask-RCNN. This support can help improve the atlas, particularly for videos featuring a salient object or human. Please note that the current implementation supports only one foreground object with a background.
- To use Carvekit, which is for background removal:
git clone https://github.com/OPHoperHPO/image-background-remove-tool.git
export PYTHONPATH=$PWD
python test.py --video_name data/test/Winter_Scenes_in_Holland.mp4 --class_name portrait # portrait triggers Carvekit
- To use Mask-RCNN:
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
export PYTHONPATH=$PWD
python test.py --video_name data/test/Winter_Scenes_in_Holland.mp4 --class_name anything # actually not work for this video
where --class_name determines the COCO class name of the sought foreground object. It is also possible to choose the first instance retrieved by Mask-RCNN by using --class_name anything
.
In both two settings, we suggest you to check the generated masks under data/test/{vid_name}_seg
. If the images are all black, you can only use the non-segmentation implementation above.
If you want to find the best setting to get an atlas for deflickering, we provide a reference guide here:
-
(Important) Iteration number: Please change this according to the total frame number of your video and the downsample rate of the image size. For example, we adopt
10000
iteration number for the example video with80
frames and a downsample rate of4
. If you find the results are not as expected, you can try to increase theiters_num
(for example:100000
). If you want to use the implementation with segmentation masks, it is suggested to increase theiters_num
. -
(Important) Optical flow loss weight: Please change
optical_flow_coeff
andalpha_flow_factor
(Note:alpha_flow_factor
only used in the advanced features with segmentation masks) according the intensity of flicker in your video. For example, we adopt500.0
for theoptical_flow_coeff
and4900.0
for thealpha_flow_factor
for the sample video. If the video has minor flickering, you can use5.0
for theoptical_flow_coeff
and49.0
for thealpha_flow_factor
. -
Downsample rate: We find that downsampling the resolution of the neural atlas by
4
times make the convergence much faster and slightly influences the quality. You can choose your own downsample rate. -
Maximum number of frames: We set the
maximum_number_of_frames
to 200. The performance for longer videos is not evaluated. It is recommended to split long videos into several shorter sequences. -
Useness of segmentation masks: Perfect segmentation masks will increase the quality of the neural atlas, especially for objects with significant motion. However, in most cases, the improvement brought by segmentation on the final prediction is not significant since neural filtering can filter the flaws in the atlas. For now, we provide a naive version for segmentation masks support above.
We release two parts of datasets:
-
Dataset-1 including our collected synthesized videos from text-to-video algorithms, old movies, old cartoons, time-lapse videos, slow-motion videos.
-
Dataset-2 including processed videos by the image processing algorithms from fast_blind_video_consistency. Considering the link in the original repo is dead, we provide it here.
Potential applications: Our model can be applied to all evaluated types of flickering videos. Besides, while our approach is designed for videos, it is possible to apply Blind Deflickering for other tasks (e.g., novel view synthesis) where flickering artifacts exist.
Temporal consistency beyond our scope: Solving the temporal inconsistency of video content is beyond the scope of deflickering. For example, the contents obtained by video generation algorithms can be very different. Large scratches in old films can destroy the contents and result in unstable videos, which requires extra restoration technique. We leave the study for a general framework to solve these temporally inconsistent artifacts for future work.
Our code is heavily relied on layered-neural-atlases, fast_blind_video_consistency, and pytorch-deep-video-prior.
While we do not work on this project full-time, please feel free to provide any suggestions. We would also appreciate it if anyone could help us improve the engineering part of this project.
If you find our work useful in your research, please consider citing:
@InProceedings{Lei_2023_CVPR,
author = {Lei, Chenyang and Ren, Xuanchi and Zhang, Zhaoxiang and Chen, Qifeng},
title = {Blind Video Deflickering by Neural Filtering with a Flawed Atlas},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2023},
}
or
@article{lei2023blind,
title={Blind Video Deflickering by Neural Filtering with a Flawed Atlas},
author={Lei, Chenyang and Ren, Xuanchi and Zhang, Zhaoxiang and Chen, Qifeng},
journal={arXiv preprint arXiv:2303.08120},
year={2023}
}