ECCV, 2024
Shaowei Liu
Β·
Zhongzheng Ren
Β·
Saurabh Gupta*
Β·
Shenlong Wang*
Β·
This repository contains the pytorch implementation for the paper PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation, ECCV 2024. In this paper, we present a novel training-free image-to-video generation pipeline integrates physical simulation and generative video diffusion prior.
- Installation
- Colab Notebook
- Quick Demo
- Perception
- Simulation
- Rendering
- All-in-One command
- Evaluation
- Custom Image Video Generation
- Citation
- Clone this repository:
git clone --recurse-submodules https://github.com/stevenlsw/physgen.git cd physgen
- Install requirements by the following commands:
conda create -n physgen python=3.9 conda activate physgen pip install -r requirements.txt
Run our Colab notebook for quick start!
-
Run image space dynamics simulation in just 3 seconds without GPU and any displace device and additional setup required!
export PYTHONPATH=$(pwd) name="pool" python simulation/animate.py --data_root data --save_root outputs --config data/${name}/sim.yaml
-
The output video should be saved in
outputs/${name}/composite.mp4
. Try setname
to bedomino
,balls
,pig_ball
andcar
for other scenes exploration. The example outputs are shown below:Input Image Simulation Output Video
- Please see perception/README.md for details.
Input | Segmentation | Normal | Albedo | Shading | Inpainting |
---|---|---|---|---|---|
-
Simulation requires the following input for each image:
image folder/ βββ original.png βββ mask.png # segmentation mask βββ inpaint.png # background inpainting βββ sim.yaml # simulation configuration file
-
sim.yaml
specify the physical properties of each object and initial conditions (force and speed on each object). Please seedata/pig_ball/sim.yaml
for an example. Setdisplay
totrue
to visualize the simulation process with display device, setsave_snapshot
totrue
to save the simulation snapshots. -
Run the simulation by the following command:
cd simulation python animate.py --data_root ../data --save_root ../outputs --config ../data/${name}/sim.yaml
-
The outputs are saved in
outputs/${name}
as follows:output folder/ βββ history.pkl # simulation history βββ composite.mp4 # composite video |ββ composite.pt # composite video tensor βββ mask_video.pt # foreground masked video tensor βββ trans_list.pt # objects transformation list tensor
- Relighting requires the following input:
image folder/ # βββ normal.npy # normal map βββ shading.npy # shading map by intrinsic decomposition previous output folder/ βββ composite.pt # composite video βββ mask_video.pt # foreground masked video tensor βββ trans_list.pt # objects transformation list tensor
- The
perception_input
is the image folder contains the perception result. Theprevious_output
is the output folder from the previous simulation step. - Run the relighting by the following command:
cd relight python relight.py --perception_input ../data/${name} --previous_output ../outputs/${name}
- The output
relight.mp4
andrelight.pt
is the relighted video and tensor. - Compare between composite video and relighted video:
Input Image Composite Video Relight Video
-
Download the SEINE model follow instruction
# install git-lfs beforehand mkdir -p diffusion/SEINE/pretrained git clone https://huggingface.co/CompVis/stable-diffusion-v1-4 diffusion/SEINE/pretrained/stable-diffusion-v1-4 wget -P diffusion/SEINE/pretrained https://huggingface.co/Vchitect/SEINE/resolve/main/seine.pt
-
The video diffusion rendering requires the following input:
image folder/ # βββ original.png # input image βββ sim.yaml # simulation configuration file (optional) previous output folder/ βββ relight.pt # composite video βββ mask_video.pt # foreground masked video tensor
-
Run the video diffusion rendering by the following command:
cd diffusion python video_diffusion.py --perception_input ../data/${name} --previous_output ../outputs/${name}
denoise_strength
andprompt
could be adjusted in the above script.denoise_strength
controls the amount of noise added, 0 means no denoising, 1 means denoise from scratch with lots of variance to the input image.prompt
is the input prompt for video diffusion model, we use default foreground object names from perception model as prompt. -
The output
final_video.mp4
is the rendered video. -
Compare between relight video and diffuson rendered video:
Input Image Relight Video Final Video
We integrate the simulation, relighting and video diffusion rendering in one script. Please follow the Video Diffusion Rendering to download the SEINE model first.
bash scripts/run_demo.sh ${name}
We compare ours against open-sourced img-to-video models DynamiCrafter, I2VGen-XL, SEINE and collected reference videos GT in Sec. 4.3.
-
Install pytorch-fid:
pip install pytorch-fid
-
Download the evaluation data from here for all comparisons and unzip to
evaluation
directory. Choose${method name}
fromDynamiCrafter
,I2VGen-XL
,SEINE
,ours
. -
Evaluate image FID:
python -m pytorch_fid evaluation/${method name}/all evaluation/GT/all
-
Evaluate motion FID:
python -m pytorch_fid evaluation/${method name}/all_flow evaluation/GT/all_flow
-
For motion FID, we use RAFT to compute optical flow between neighbor frames. The video processing scripts can be found here.
-
Our method should generally work for side-view and top-down view images. For custom images, please follow the perception, simulation, rendering pipeline to generate the video.
-
Critical steps (assume proper environment installed)
-
Input:
image folder/ βββ original.png
-
Perception:
cd perception/ python gpt_ram.py --img_path ${image folder} python run_gsam.py --input ${image folder} python run_depth_normal.py --input ${image folder} --vis python run_fg_bg.py --input ${image folder} --vis_edge python run_inpaint.py --input ${image folder} --dilate_kernel_size 20 python run_albedo_shading.py --input ${image folder} --vis
-
After perception step, you should get
image folder/ βββ original.png βββ mask.png # foreground segmentation mask βββ inpaint.png # background inpainting βββ normal.npy # normal map βββ shading.npy # shading map by intrinsic decomposition βββ edges.json # edges βββ physics.yaml # physics properties of foreground objects
-
Compose
${image folder}/sim.yaml
for simulation by specifying the object init conditions (you could check foreground objects ids in${image folder}/intermediate/fg_mask_vis.png
), please see example indata/pig_ball/sim.yaml
, copy the content inphysics.yaml
tosim.yaml
and edges information fromedges.json
. -
Run simulation:
cd simulation/ python animate.py --data_root ${image_folder} --save_root ${image_folder} --config ${image_folder}/sim.yaml
-
Run rendering:
cd relight/ python relight.py --perception_input ${image_folder} --previous_output ${image_folder} cd ../diffusion/ python video_diffusion.py --perception_input ${image_folder} --previous_output ${image_folder} --denoise_strength ${denoise_strength}
-
We put some custom images under
custom_data
folder. You could play with each image by running the above steps and see different physical simulations.Balls Shelf Boxes Kitchen Table Toy
If you find our work useful in your research, please cite:
@inproceedings{liu2024physgen,
title={PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation},
author={Liu, Shaowei and Ren, Zhongzheng and Gupta, Saurabh and Wang, Shenlong},
booktitle={European Conference on Computer Vision ECCV},
year={2024}
}
- Grounded-Segment-Anything for segmentation in perception
- GeoWizard for depth and normal estimation in perception
- Intrinsic for intrinsic image decomposition in perception
- Inpaint-Anything for image inpainting in perception
- Pymunk for physics simulation in simulation
- SEINE for video diffusion in rendering