Project Page | Paper | Data
Tensor4D : Efficient Neural 4D Decomposition for High-fidelity Dynamic Reconstruction and Rendering
Ruizhi Shao, Zerong Zheng, Hanzhang Tu, Boning Liu, Hongwen Zhang, Yebin Liu. (CVPR 2023 Highlight)
This is the official implementation of Tensor4D: Efficient Neural 4D Decomposition for High-fidelity Dynamic Reconstruction and Rendering. Tensor4D can efficiently achieve high-fidelity dynamic reconstruction and rendering with only sparse views or monocular camera:
To deploy and run Tensor4d, you will need install the following Python libraries
numpy
opencv-python
torch
tensorboard
shutil
tqdm
pyhocon==0.3.57
glob
scipy
einops
We have tested Tensor4D with several PyTorch versions including 1.13 and 2.0. It is recommended to use PyTorch with these versions to ensure compatibility. In addition, we find that pyhocon==0.3.60 is not compatible with our project.
We have provided several samples for Tensor4D training. You can download the test samples using the following links:
- thumbsup_v4 : A man giving thumbs up and waving, captured by 4 RGB cameras focusing on the front face.
- dance_v4 : A woman dancing in a gorgeous dress, captured by 4 RGB cameras focusing on the front face.
- boxing_v12 : A man in a down jacket boxing, captured by 12 RGB cameras in a circle.
- lego_v1 : A LEGO excavator with a raised arm, captured by monocular camera, similar with D-NeRF dataset.
The format of our test samples is the same with NeuS, we will also provide scripts that convert NeRF blender dataset to our dataset.
To train Tensor4D for monocular cases, you can use the following scripts:
# Train Tensor4D with flow
python exp_runner.py --case lego_v1 --mode train --conf confs/t4d_lego.conf --gpu 0
# Resume training
python exp_runner.py --case lego_v1 --mode train --conf confs/t4d_lego.conf --gpu 0 --is_continue
After training, you can visualize the results by the following scripts:
# interpolation between view 0 and view 2, setting the number of interpolation views to 100 and the downsampling resolution to 2
python exp_runner.py --case t4d_lego --mode interpolate_0_2 --conf confs/t4d_lego.conf --is_continue --inter_reso_level 2 --gpu 1 --n_frames 100
Similarly, you can train Tensor4D for multi-view cases according to the following scripts:
# Train Tensor4D without flow
python exp_runner.py --case thumbsup_v4 --mode train --conf confs/t4d_origin.conf --gpu 0
After about 50k iterations of training, you can achieve a reasonably good result. If you want higher quality results, you may need to train for a longer period of time with more iterations, such as 200k iterations.
Tensor4D can be further accelerated with image guidance. Here we provide a naive implementation which directly uses the 2D CNN to extract image features as additional conditions:
# Train Tensor4D with image guidance on thumbsup_v4
python exp_runner.py --case thumbsup30 --mode train --conf confs/t4d_thumbsup_img.conf --gpu 0
# Train Tensor4D with image guidance on dance_v4
python exp_runner.py --case dance_v4 --mode train --conf confs/t4d_dance_img.conf --gpu 0
# Train Tensor4D with image guidance on boxing_v12
python exp_runner.py --case boxing_v12 --mode train --conf confs/t4d_boxing_img.conf --gpu 0
Tensor4D with image guidance can achieve more efficient convergence within 5k iterations:
We provide the config documentation to explain the parameters in Tensor4D. It is recommended to check out the documentation before training your own Tensor4D model.
We provide Tensor4D dataset in this link. Our dataset contains 5 multi-view sequences which is captured by 6 RGB cameras. All cameras are directed towards the front of the human. We will provide the scripts to process these raw data and convert them into our training samples.
We now provide the scripts to process raw data and convert them into our training samples in scripts
. Thanks, Sun(286668458@qq.com), for writing and providing the data processing code.
If you find this code useful for your research, please use the following BibTeX entry.
@inproceedings{shao2023tensor4d,
title = {Tensor4D: Efficient Neural 4D Decomposition for High-fidelity Dynamic Reconstruction and Rendering},
author = {Shao, Ruizhi and Zheng, Zerong and Tu, Hanzhang and Liu, Boning and Zhang, Hongwen and Liu, Yebin},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
year = {2023}
}
Our project is benefit from these great resources:
- NeuS:Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction.
- TensoRF: Tensorial Radiance Fields
- Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields
Thanks for their sharing code.