Many video enhancement algorithms rely on optical flow to register frames in a video sequence. Precise flow estimation is however intractable; and optical flow itself is often a sub-optimal representation for particular video processing tasks. In this paper, we propose task-oriented flow (TOFlow), a motion representation learned in a self-supervised, task-specific manner. We design a neural network with a trainable motion estimation component and a video processing component, and train them jointly to learn the task-oriented flow. For evaluation, we build Vimeo-90K, a large-scale, high-quality video dataset for low-level video processing. TOFlow outperforms traditional optical flow on standard benchmarks as well as our Vimeo-90K dataset in three video processing tasks: frame interpolation, video denoising/deblocking, and video super-resolution.
Evaluated on RGB channels.
The metrics are PSNR / SSIM
.
Method | Pretrained SPyNet | Vimeo90k-triplet | Download |
---|---|---|---|
tof_vfi_spynet_chair_nobn_1xb1_vimeo90k | spynet_chairs_final | 33.3294 / 0.9465 | model | log |
tof_vfi_spynet_kitti_nobn_1xb1_vimeo90k | spynet_chairs_final | 33.3339 / 0.9466 | model | log |
tof_vfi_spynet_sintel_clean_nobn_1xb1_vimeo90k | spynet_chairs_final | 33.3170 / 0.9464 | model | log |
tof_vfi_spynet_sintel_final_nobn_1xb1_vimeo90k | spynet_chairs_final | 33.3237 / 0.9465 | model | log |
tof_vfi_spynet_pytoflow_nobn_1xb1_vimeo90k | spynet_chairs_final | 33.3426 / 0.9467 | model | log |
Note: These pretrained SPyNets don't contain BN layer since batch_size=1
, which is consistent with https://github.com/Coldog2333/pytoflow
.
@article{xue2019video,
title={Video enhancement with task-oriented flow},
author={Xue, Tianfan and Chen, Baian and Wu, Jiajun and Wei, Donglai and Freeman, William T},
journal={International Journal of Computer Vision},
volume={127},
number={8},
pages={1106--1125},
year={2019},
publisher={Springer}
}