This repository contains code for generating 3D Ken Burns effect from single image. This work is mainly based on 3D Ken Burns Effect from a Single Image. This paper was a starting point for our work and therefore the trained networks released by Niklaus et al. are comptible with our framework and can be downloaded from their repository.
We provide code for the training of the different neural networks used to achieve the 3D Ken Burns effect. In addition we propose some extension of the original work, to improve both the depth estimation and the image inpainting. Finally, we develop an semi-supervised method for the disocclusion inpainting problem in order to prevent the difficulty of getting a synthetic dataset as used in the original paper. We also proposed a slight modification of the 3D KBE to produce fake dolly zoom.
Here is a video with some of our results.
First download our trained networks by running ./donwload.sh
from the directory where the repository has been cloned.
Note that you might need to make that script executable with chmod +x download.sh
.
To generate 3D KBE, use the script kbe.py
. Some parameters can be set to change from default settings. If no path for the networks are specified, default names and paths from donwload script will be used.
-
Selection of the input image and the networks to be used:
in=
: path to input imageout=
: saving pathinpaint-path=
: path to inpainting networkrefine-path=
: path to refinement networkestim-path=
: path to estimation networkinpaint-depth=
: use different network for color and depth inpaintingpretrained-refine
: must be set when using trained refinement network from original paperpretrained-estim
: must be set when using trained estimation network from original paper
-
Specifying the cropping windows:
startU=
: x coordinate of the starting crop windowstartV=
: y coordinate of the starting crop windowendU=
: x coordinate of the ending crop windowendV=
: y coordinate of the ending crop windowstartW=
: width of the starting crop windowstartH=
: height of the starting crop windowendW=
: width of the ending crop windowendH=
: height of the ending crop window
-
Other parameters:
write-frames
: output frames of the video in high quality in a separate folder2d
: perform the 2D Ken Burns effect
If some of the cropping windows parameters are not specified, default parameters will be applied.
This will create a video of the 3D KBE but frames of that video can be outputed as well with option --write-frames
Example:
CUDA_AVAILABLE_DEVICES=X python kbe.py --in /images/test.png --out /images/kbe/ --estim-path /models/trained/disparity-estimation.tar --refine-path /models/trained/disparity-refinement.tar --inpaint-path /models/trained/inpainting-color.tar --write-frames --startU 512 --startV 512 --endU 600 --endV 600 --startW 400 --startH 200 --endW 300 --endH 150
To create dolly zoom effect use the --dolly
option with the kbe.py
script. It will work exactly as for the 3D KBE except that the focal length of the camera will change during the effect to compensate the forward or backward motion, keeping the focused object unchanged. It is recommended to set the same position for the centers of the two cropping windows in order to remove any lateral motion.
Example:
CUDA_AVAILABLE_DEVICES=X python kbe.py --in /images/test.png --out /images/kbe/ --estim-path /models/trained/disparity-estimation.tar --refine-path /models/trained/disparity-refinement.tar --inpaint-path /models/trained/inpainting-color.tar --write-frames --startU 512 --startV 512 --endU 512 --endV 512 --startW 400 --startH 200 --endW 300 --endH 150 --dolly
In order to train the network, the script train.py
can be used.
CUDA_AVAILABLE_DEVICES=X python train.py
A few parameters are available in order to control the training:
training-mode
: select which network to train, can take values:estimation
,refinement
,inpainting
,inpainting_ref
mask-loss
: choose the type of mask loss, can take valuesnone
(no mask loss),same
(mask loss computed on depth dataset) orother
(mask loss computed on another dataset)mask-loss-dataset
: path to dataset to be used for mask loss, required whenmask-loss=other
n-epochs
lr-estimation
: learning rate for estimation netlr-refinement
: learning rate for refinement netlr-inpaint
: learning rate for inpainting netlr-discriminator
: learning rate for discrimintorsave-name
: name for saving network weightsmodel-path
: path to pre-trained network for refinement or continue trainingbatch-size
gamma-lr
: learning rate decay ratepartial-conv
: use partial conv, only for inpainting
Example for training the inpainting network:
CUDA_AVAILABLE_DEVICES=X python train.py --training-mode inpainting --batch-size=2 --lr-inpaint 0.0005 --save-name test --partial-conv
Then a dataset must be specified, it should contain the pairs (image, depth). To define a dataset just create a dict
object in the file train.py
. This dict must contains at least three attributes:
name
: the name of the dataset (only for display).path
: the path to the files. In this folder there should be two foldersimages
anddepth
which contain respectively the RGB images and associated depth maps. Correspong image and depth should have the same name.param
: the camera parameters used for that specific dataset. It is also adict
with only two attributesbaseline
andfocal
, the baseline and focal length of the camera.
Multiple datasets can be used for training, simply feed a dataset list to the Trainer
class.
Note that some adjustment may also be required in the data loading file (data_loader.py
) in order to match the folder layout and files extension of the images and depth maps.
Important notes
- It is important to set properly the
CUDA_HOME
variable according to your system so that CUDA kernels can be executed. - It is highly recommended to run these scripts on a GPU. Minimum recommended memory would be 6 GB.
- Dependencies:
PyTorch=1.3.1
,torchvision
,kornia=0.3.0
,cupy
,h5py
,opencv
,dill
andmoviepy
. - Path to helper_math.h: Change the path to helper_math.h to match your own system in ./utils/common.py l.14 .