Skip to content

Official code for "StyleDyRF: Zero-shot 4D Style Transfer for Dynamic Neural Radiance Fields"

License

Notifications You must be signed in to change notification settings

ToughStoneX/StyleDyRF

Repository files navigation

StyleDyRF

Official code for "StyleDyRF: Zero-shot 4D Style Transfer for Dynamic Neural Radiance Fields"

Abstract

image

4D style transfer aims at transferring arbitrary visual style to the synthesized novel views of a dynamic 4D scene with varying viewpoints and times. Existing efforts on 3D style transfer can effectively combine the visual features of style images and neural radiance fields (NeRF) but fail to handle the 4D dynamic scenes limited by the static scene assumption. Consequently, we aim to handle the novel challenging problem of 4D style transfer for the first time, which further requires the consistency of stylized results on dynamic objects. In this paper, we introduce StyleDyRF, a method that represents the 4D feature space by deforming a canonical feature volume and learns a linear style transformation matrix on the feature volume in a data-driven fashion. To obtain the canonical feature volume, the rays at each time step are deformed with the geometric prior of a pre-trained dynamic NeRF to render the feature map under the supervision of pre-trained visual encoders. With the content and style cues in the canonical feature volume and the style image, we can learn the style transformation matrix from their covariance matrices with lightweight neural networks. The learned style transformation matrix can reflect a direct matching of feature covariance from the content volume to the given style pattern, in analogy with the optimization of the Gram matrix in traditional 2D neural style transfer. The experimental results show that our method not only renders 4D photorealistic style transfer results in a zero-shot manner but also outperforms existing methods in terms of visual quality and consistency

Setup

Please prepare the environment following robust-dynrf.

Tested with Pytorch 2.0/2.1 and CUDA 11.8. You can change the pytorch version depending on your local machines.

pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
pip install tqdm scikit-image opencv-python configargparse lpips imageio-ffmpeg kornia lpips tensorboard imageio easydict matplotlib scipy plyfile timm

4D Scene Dataset

Create dataset directory:

mkdir dataset
cd dataset

Download the pre-processed data by DynamicNeRF.

mkdir nvidia
wget --no-check-certificate https://filebox.ece.vt.edu/~chengao/free-view-video/data.zip
unzip data.zip
rm data.zip

Download Davis dataset and put the images into dataset/davis/${SCENE_NAME}/images.

Data pre-processing

Download DPT and RAFT pretrained weights.

mkdir weights
cd weights
wget --no-check-certificate https://github.com/intel-isl/DPT/releases/download/1_0/dpt_large-midas-2f21e586.pt
wget --no-check-certificate https://www.dropbox.com/s/4j4z58wuv8o0mfz/models.zip
unzip models.zip
rm models.zip
cd ..

Predict the monocular depth.

python preprocess_scripts/generate_DPT.py --dataset_path ${SCENE_DIR} --model weights/dpt_large-midas-2f21e586.pt

Predict the optical flows.

python preprocess_scripts/generate_flow.py --dataset_path ${SCENE_DIR} --model weights/models/raft-things.pth

Predict the motion mask.

python preprocess_scripts/generate_mask.py --dataset_path ${SCENE_DIR}

Style Dataset

Download the style dataset and decompressed it into dataset/WikiArt.

Training

Config

Check the example config files provided in configs/nvidia_with_pose/${CONFIG_FILE}, i.e. configs/nvidia_with_pose/balloon1.txt, configs/nvidia_with_pose/balloon2.txt, configs/davis/bear.txt and etc.

You can set the expname in configs/nvidia_with_pose/${CONFIG_FILE}. Adjust the N_voxel_t in configs/nvidia_with_pose/${CONFIG_FILE} to match the number of images in the specified datadir of configs/nvidia_with_pose/${CONFIG_FILE}.

Stage 1: Dynamic NeRF Pretraining

# Training on the "balloon1" scene of Nvidia dataset
CUDA_VISIBLE_DEVICES=0 python train.py --config configs/nvidia_with_pose/balloon1.txt

# Training on the "balloon2" scene of Nvidia dataset
CUDA_VISIBLE_DEVICES=0 python train.py --config configs/nvidia_with_pose/balloon2.txt

# Training on the "playground" scene of Nvidia dataset
CUDA_VISIBLE_DEVICES=0 python train.py --config configs/nvidia_with_pose/playground.txt

# Training on the "bear" scene of davis dataset
CUDA_VISIBLE_DEVICES=0 python train.py --config configs/davis/bear.txt

# Training on the "horsejump-high" scene of davis dataset
CUDA_VISIBLE_DEVICES=0 python train.py --config configs/davis/horsejump-high.txt

When training is finished, checkpoint file can be found in log/${expname}/${expname}.th.

Stage 2: Canonical Feature Distillation

Before the training phase, you need to prepare vgg checkpoints in the pretrained directory. You can download the vgg checkpoints from here. Unzip it into the pretrained directory.

cd pretrained/
wget https://mogface.oss-cn-zhangjiakou.aliyuncs.com/xhb/share/styledyrf_tvcg/vgg_pretrained.zip
unzip vgg_pretrained.zip
rm vgg_pretrained.zip

Then going back to the root directory of the project and start the training.

# Training on the "balloon1" scene of Nvidia dataset
CUDA_VISIBLE_DEVICES=0 python train_feature.py \
    --config configs/nvidia_with_pose/balloon1.txt \
    --patch_size 256 \
    --basedir log_feature \
    --n_iters 25000 \
    --batch_size 8192 \
    --ckpt log/Balloon1_with_pose/Balloon1_with_pose.th

# Training on the "balloon2" scene of Nvidia dataset
CUDA_VISIBLE_DEVICES=0 python train_feature.py \
    --config configs/nvidia_with_pose/balloon2.txt \
    --patch_size 256 \
    --basedir log_feature \
    --n_iters 25000 \
    --batch_size 8192 \
    --ckpt log/Balloon2_with_pose/Balloon2_with_pose.th

# Training on the "playground" scene of Nvidia dataset
CUDA_VISIBLE_DEVICES=0 python train_feature.py \
    --config configs/nvidia_with_pose/playground.txt \
    --patch_size 256 \
    --basedir log_feature \
    --n_iters 25000 \
    --batch_size 8192 \
    --ckpt log/Playground_with_pose/Playground_with_pose.th

# Training on the "bear" scene of davis dataset
CUDA_VISIBLE_DEVICES=0 python train_feature.py \
    --config configs/davis/bear.txt \
    --patch_size 256 \
    --basedir log_feature \
    --n_iters 25000 \
    --batch_size 8192 \
    --ckpt log/bear/bear.th

# Training on the "horsejump-high" scene of davis dataset
CUDA_VISIBLE_DEVICES=0 python train_feature.py \
    --config configs/davis/horsejump-high.txt \
    --patch_size 256 \
    --basedir log_feature \
    --n_iters 25000 \
    --batch_size 8192 \
    --ckpt log/horsejump-high/horsejump-high.th
  • --patch_size is the rendered patch size during feature distillation. The larger the patch size the better the performance. However, a large patch size might result in memory overflow in GPU. You can adjust this hyperparameter based on your local machine and GPUs.
  • --n_iters is the number of iterations during feature distillation. You can use our default setting of 25000 or your custom settings.
  • --batch_size is the batch size of rays during feature distillation. Large batch size might require further GPU memories. You can adjust this hyperparameter based on your local machine and GPUs.
  • --ckpt is the path of pre-train dynamic NeRF in stage 1.
  • --basedir is the directory that will store the trained model in the feature distillation stage.

When training is finished, checkpoint file can be found in log_feature/${expname}/${expname}.th.

Stage 3: Canonical Style Transformation

# Training on the "balloon1" scene of Nvidia dataset
CUDA_VISIBLE_DEVICES=0 python train_style.py \
    --config configs/nvidia_with_pose/balloon1.txt \
    --patch_size 256 \
    --basedir log_style \
    --n_iters 25000 \
    --batch_size 4096 \
    --ckpt_feature log_feature/Balloon1_with_pose/Balloon1_with_pose.th \
    --wikiartdir datasets/WikiArt

# Training on the "balloon2" scene of Nvidia dataset
CUDA_VISIBLE_DEVICES=0 python train_style.py \
    --config configs/nvidia_with_pose/balloon2.txt \
    --patch_size 256 \
    --basedir log_style \
    --n_iters 25000 \
    --batch_size 4096 \
    --ckpt_feature log_feature/Balloon1_with_pose/Balloon1_with_pose.th \
    --wikiartdir datasets/WikiArt

# Training on the "playground" scene of Nvidia dataset
CUDA_VISIBLE_DEVICES=0 python train_style.py \
    --config configs/nvidia_with_pose/playground.txt \
    --patch_size 256 \
    --basedir log_style \
    --n_iters 25000 \
    --batch_size 4096 \
    --ckpt_feature log_feature/Balloon1_with_pose/Balloon1_with_pose.th \
    --wikiartdir datasets/WikiArt

# Training on the "bear" scene of davis dataset
CUDA_VISIBLE_DEVICES=0 python train_style.py \
    --config configs/davis/bear.txt \
    --patch_size 256 \
    --basedir log_style \
    --n_iters 25000 \
    --batch_size 4096 \
    --ckpt_feature log_feature/bear/bear.th \
    --wikiartdir datasets/WikiArt

# Training on the "horsejump-high" scene of davis dataset
CUDA_VISIBLE_DEVICES=0 python train_style.py \
    --config configs/davis/horsejump-high.txt \
    --patch_size 256 \
    --basedir log_style \
    --n_iters 25000 \
    --batch_size 4096 \
    --ckpt_feature log_feature/horsejump-high/horsejump-high.th \
    --wikiartdir datasets/WikiArt
  • --patch_size is the rendered patch size during feature distillation. The larger the patch size the better the performance. However, a large patch size might result in memory overflow in GPU. You can adjust this hyperparameter based on your local machine and GPUs.
  • --n_iters is the number of iterations during the training stage 3. You can use our default setting of 25000 or your custom settings.
  • --batch_size is the batch size of rays during the training stage 3. Large batch size might require further GPU memories. You can adjust this hyperparameter based on your local machine and GPUs.
  • --ckpt_feature is the path of the trained dynamic NeRF in stage 2 (Canonical Feature Distillation).
  • --basedir is the directory that will store the trained model in the canonical style transformation stage.
  • --wikiartdir is the path to the style dataset.

When training is finished, checkpoint file can be found in log_style/${expname}/${expname}.th.

Testing

Put the style images for testing into datasets/style_imgs_test

# Test on the "Balloon1" scene of Nvidia dataset
CUDA_VISIBLE_DEVICES=0 python test_style.py \
    --config configs/nvidia_with_pose/balloon1.txt \
    --ckpt_style log_style/Balloon1_with_pose/Balloon1_with_pose.th \
    --ckpt_matrix log_style/Balloon1_with_pose/Balloon1_with_pose_matrix.th \
    --ckpt_spn log_style/Balloon1_with_pose/Balloon1_with_pose_spn.th \
    --style_img_dir datasets/style_imgs_test \
    --patch_size 256 \
    --render_train 1 \
    --cpu_percentage 0.5 \
    --basedir log_style
# Test on the "bear" scene of davis dataset
CUDA_VISIBLE_DEVICES=0 python test_style.py \
    --config configs/davis/bear.txt \
    --ckpt_style log_style/bear/bear.th \
    --ckpt_matrix log_style/bear/bear_matrix.th \
    --ckpt_spn log_style/bear/bear_spn.th \
    --style_img_dir datasets/style_imgs_test \
    --patch_size 256 \
    --render_train 1 \
    --cpu_percentage 0.5 \
    --basedir log_style
  • --style_img_dir is the directory of style images for inference.
  • --basedir is the directory that stores the trained model in stage 3 (Canonical Style Transformation).

The stylized results can be found in log_style/${expname}/${expname}/style_transfer_results

Contact

For any questions related to our paper and implementation, please email hongbinxu1013@gmail.com.

Log

  • Upload the basic code of StyleDyRF.
  • Upload the training code of StyleDyRF on Nvidia dataset.
  • Upload the training code of StyleDyRF on Davis dataset and other custom sequences.
  • Upload the test code of StyleDyRF.

Citation

@article{xu2024styledyrf,
  title={StyleDyRF: Zero-shot 4D Style Transfer for Dynamic Neural Radiance Fields},
  author={Xu, Hongbin and Chen, Weitao and Xiao, Feng and Sun, Baigui and Kang, Wenxiong},
  journal={arXiv preprint arXiv:2403.08310},
  year={2024}
}

Acknowledgements

The code is available under the MIT license and draws from robust-dynrf, TensoRF, DynamicNeRF, and BARF, which are also licensed under the MIT license. Licenses for these projects can be found in the licenses/ folder.

About

Official code for "StyleDyRF: Zero-shot 4D Style Transfer for Dynamic Neural Radiance Fields"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published