Moustafa Meshry1, Dan B Goldman2, Sameh Khamis2, Hugues Hoppe2, Rohit Pandey2, Noah Snavely2, Ricardo Martin-Brualla2.
1University of Maryland, College Park 2Google Inc.
To appear at CVPR 2019 (Oral).
We will provide Tensorflow implementation and pretrained models for our paper soon.
Paper | Video | Code | Project page
We explore total scene capture — recording, modeling, and rerendering a scene under varying appearance such as season and time of day. Starting from internet photos of a tourist landmark, we apply traditional 3D reconstruction to register the photos and approximate the scene as a point cloud. For each photo, we render the scene points into a deep framebuffer, and train a neural network to learn the mapping of these initial renderings to the actual photos. This rerendering network also takes as input a latent appearance vector and a semantic mask indicating the location of transient objects like pedestrians. The model is evaluated on several datasets of publicly available images spanning a broad range of illumination conditions. We create short videos demonstrating realistic manipulation of the image viewpoint, appearance, and semantic labeling. We also compare results with prior work on scene reconstruction from internet photos.
We capture the appearance of the original images in the left column, and rerender several viewpoints under them. The last column is a detail of the previous one. The top row shows the renderings part of the input to the rerenderer, that exhibit artifacts like incomplete features in the statue, and an inconsistent mix of day and night appearances. Note the hallucinated twilight scene in the sky using the last appearance. Image credits: Flickr users William Warby, Neil Rickards, Rafael Jimenez, acme401 (Creative Commons).
Frames from a synthesized camera path that smoothly transitions from the photo on the left to the photo on the right by smoothly interpolating both viewpoint and the latent appearance vectors. Please see the supplementary video. Photo Credits: Allie Caulfield, Tahbepet, Till Westermayer, Elliott Brown (Creative Commons).
We thank Gregory Blascovich for his help in conducting the user study, and Johannes Schönberger and True Price for their help generating datasets.
Staged-training consists of three stages:
- Pretraining the appearance network.
- Training the rendering network while fixing the weights for the appearance network.
- Finetuning both the appearance and the rendering networks.
- Set a path to a base_dir that contains the source code:
base_dir=//to/neural_rendering
mkdir $base_dir
cd $base_dir
- We assume the following format for an aligned dataset:
- Each training image contains 3 file with the following nameing format:
- real image: %04d_reference.png
- render color: %04d_color.png
- render depth: %04d_depth.png
- Each training image contains 3 file with the following nameing format:
- Set dataset name: e.g.
dataset_name='trevi3k' # set to any name
- Split the dataset into train and validation sets in two subdirectories:
- $base_dir/datasets/$dataset_name/train
- $base_dir/datasets/$dataset_name/val
- Download the DeepLab semantic segmentation model trained on the ADE20K dataset from this link: http://download.tensorflow.org/models/deeplabv3_xception_ade20k_train_2018_05_29.tar.gz
- Unzip the downloaded file to: $base_dir/deeplabv3_xception_ade20k_train
- Download this file for an implementation of a vgg-based perceptual loss.
- Download trained weights for the vgg network as instructed in this link: https://github.com/machrisaa/tensorflow-vgg
- Save the vgg weights to $base_dir/vgg16_weights/vgg16.npy
- Run the preprocessing pipeline which consists of:
- Filtering out sparse renders.
- Semantic segmentation of ground truth images.
- Exporting the dataset to tfrecord format.
# Run locally
python tools/dataset_utils.py \
--dataset_name=$dataset_name \
--dataset_parent_dir=$base_dir/datasets/$dataset_name \
--output_dir=$base_dir/datasets/$dataset_name \
--xception_frozen_graph_path=$base_dir/deeplabv3_xception_ade20k_train/frozen_inference_graph.pb \
--alsologtostderr
# Run locally
python pretrain_appearance.py \
--dataset_name=$dataset_name \
--train_dir=$base_dir/train_models/$dataset_name-app_pretrain \
--imageset_dir=$base_dir/datasets/$dataset_name/train \
--train_resolution=512 \
--metadata_output_dir=$base_dir/datasets/$dataset_name
Set the dataset_parent_dir variable below to point to the directory containing the generated TFRecords.
# Run locally:
dataset_parent_dir=$base_dir/datasets/$dataset_name
train_dir=$base_dir/train_models/$dataset_name-staged-fixed_appearance
load_pretrained_app_encoder=true
appearance_pretrain_dir=$base_dir/train_models/$dataset_name-app_pretrain
load_from_another_ckpt=false
fixed_appearance_train_dir=''
train_app_encoder=false
python neural_rerendering.py \
--dataset_name=$dataset_name \
--dataset_parent_dir=$dataset_parent_dir \
--train_dir=$train_dir \
--load_pretrained_app_encoder=$load_pretrained_app_encoder \
--appearance_pretrain_dir=$appearance_pretrain_dir \
--train_app_encoder=$train_app_encoder \
--load_from_another_ckpt=$load_from_another_ckpt \
--fixed_appearance_train_dir=$fixed_appearance_train_dir \
--total_kimg=4000
Set the fixed_appearance_train_dir to the train directory from the previous step.
# Run locally:
dataset_parent_dir=$base_dir/datasets/$dataset_name
train_dir=$base_dir/train_models/$dataset_name-staged-finetune_appearance
load_pretrained_app_encoder=false
appearance_pretrain_dir=''
load_from_another_ckpt=true
fixed_appearance_train_dir=$base_dir/train_models/$dataset_name-staged-fixed_appearance
train_app_encoder=true
python neural_rerendering.py \
--dataset_name=$dataset_name \
--dataset_parent_dir=$dataset_parent_dir \
--train_dir=$train_dir \
--load_pretrained_app_encoder=$load_pretrained_app_encoder \
--appearance_pretrain_dir=$appearance_pretrain_dir \
--train_app_encoder=$train_app_encoder \
--load_from_another_ckpt=$load_from_another_ckpt \
--fixed_appearance_train_dir=$fixed_appearance_train_dir \
--total_kimg=4000
experiment_title=$dataset_name-staged-finetune_appearance
local_train_dir=$base_dir/train_models/$experiment_title
dataset_parent_dir=$base_dir/datasets/$dataset_name
val_set_out_dir=$local_train_dir/val_set_output
# Run the model on validation set
echo "Evaluating the validation set"
python neural_rerendering.py \
--train_dir=$local_train_dir \
--dataset_name=$dataset_name \
--dataset_parent_dir=$dataset_parent_dir \
--run_mode='eval_subset' \
--virtual_seq_name='val' \
--output_validation_dir=$val_set_out_dir \
--logtostderr
# Evaluate quantitative metrics
python evaluate_quantitative_metrics.py \
--val_set_out_dir=$val_set_out_dir \
--experiment_title=$experiment_title \
--logtostderr