Skip to content

Commit

Permalink
Merge pull request #508 from ottogin/threestudio-integration
Browse files Browse the repository at this point in the history
Implementation of Score Distillation via Inversion
  • Loading branch information
thuliu-yt16 authored Nov 28, 2024
2 parents 8c8a480 + 3f88b3f commit 915b82d
Show file tree
Hide file tree
Showing 10 changed files with 1,670 additions and 7 deletions.
569 changes: 569 additions & 0 deletions 2dplayground_SDI_version.ipynb

Large diffs are not rendered by default.

43 changes: 38 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,14 @@ threestudio is a unified framework for 3D content creation from text prompts, si
<br/>
<img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/19284678/01a00207-3240-4a8e-aa6f-d48436370fe7.png" width="100%">
<br/>
<img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/19284678/1dbdebab-43d5-4830-872c-66b38d9fda92" width="60%">
<img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/24589363/d746b874-d82f-4977-a549-98d9ba764dfc" width="30%">
<img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/19284678/1dbdebab-43d5-4830-872c-66b38d9fda92" width="48%">
<img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/24589363/d746b874-d82f-4977-a549-98d9ba764dfc" width="25%">
<img alt="threestudio" src="https://github.com/user-attachments/assets/afcf74ee-85ff-4792-b109-191f54b44edd" width="24%">

<br/>
<img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/19284678/437b4044-142c-4e5d-a406-4d9bad0205e1" width="60%">
<img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/24589363/812741c0-7229-412e-b6ab-81e377890f04" width="30%">
<img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/19284678/437b4044-142c-4e5d-a406-4d9bad0205e1" width="48%">
<img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/24589363/812741c0-7229-412e-b6ab-81e377890f04" width="25%">
<img alt="threestudio" src="https://github.com/user-attachments/assets/c0858bc5-6b9d-446a-b5df-76534c8a3072" width="25%">

<br/>
<img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/19284678/4f4d62c5-2304-4e20-b632-afe6d144a203" width="68%">
Expand All @@ -31,7 +33,7 @@ threestudio is a unified framework for 3D content creation from text prompts, si
👆 Results obtained from methods implemented by threestudio 👆 <br/>
| <a href="https://ml.cs.tsinghua.edu.cn/prolificdreamer/">ProlificDreamer</a> | <a href="https://dreamfusion3d.github.io/">DreamFusion</a> | <a href="https://research.nvidia.com/labs/dir/magic3d/">Magic3D</a> | <a href="https://pals.ttic.edu/p/score-jacobian-chaining">SJC</a> | <a href="https://github.com/eladrich/latent-nerf">Latent-NeRF</a> | <a href="https://fantasia3d.github.io/">Fantasia3D</a> | <a href="https://fabi92.github.io/textmesh/">TextMesh</a> |
<br/>
| <a href="https://zero123.cs.columbia.edu/">Zero-1-to-3</a> | <a href="https://guochengqian.github.io/project/magic123/">Magic123</a> | <a href="https://github.com/JunzheJosephZhu/HiFA">HiFA</a> |
| <a href="https://zero123.cs.columbia.edu/">Zero-1-to-3</a> | <a href="https://guochengqian.github.io/project/magic123/">Magic123</a> | <a href="https://github.com/JunzheJosephZhu/HiFA">HiFA</a> | <a href="https://lukoianov.com/sdi">SDI</a> |
<br />
| <a href="https://instruct-nerf2nerf.github.io/">InstructNeRF2NeRF</a> | <a href="https://control4darxiv.github.io/">Control4D</a> |
</b>
Expand Down Expand Up @@ -68,6 +70,7 @@ threestudio is a unified framework for 3D content creation from text prompts, si
</b>

## News
- 08/11/2024: Thank [Artem Lukoianov](https://github.com/ottogin) for implementation of [Score Distillation via Reparametrized DDIM](https://lukoianov.com/sdi)! Text-to-3D module is added to Threestudio as well as a notebook with 2D score distillation experiments.
- 21/10/2024: Thank [Amir Barda](https://github.com/amirbarda) for implementation of [MagicClay](https://github.com/amirbarda/MagicClay)! Follow the instructions on its website to give it a try.
- 12/03/2024: Thank [Matthew Kwak](https://github.com/mskwak01) and [Inès Hyeonsu Kim](https://github.com/Ines-Hyeonsu-Kim) for implementation of [3DFuse](https://github.com/KU-CVLAB/3DFuse-threestudio)! Follow the instructions on its website to give it a try.
- 08/03/2024: Thank [Xinhua Cheng](https://github.com/cxh0519/) for implementation of [GaussianDreamer](https://github.com/cxh0519/threestudio-gaussiandreamer)! Follow the instructions on its website to give it a try.
Expand Down Expand Up @@ -241,6 +244,36 @@ For feature requests, bug reports, or discussions about technical problems, plea

## Supported Models

### Score Distillation via Reparametrized DDIM (SDI) [![arXiv](https://img.shields.io/badge/arXiv-2405.15891-b31b1b.svg?style=flat-square)](https://arxiv.org/abs/2405.15891)

SDI suggests to reconsider the approach to sampling the noise term in Dreamfusion. The paper demonstrates that score distillation process can be seen as a reparametrization of 2D image sampling algorithms. In that case the noise added on each step of score distillation should be of a very particular form. Noise in Dreamfusion (SDS), however, is sampled randomly, what causes over-blurring. SDI approximates the correct noise term by inverting the DDIM process.

Notable differences from the paper: N/A.

Pros:
* High quality of the textures
* Sharp geometric details

Cons:
* Slower than SDS (1.5x times) due to additional inversion. Still faster then prolific dreamer due to lower number of steps
* Requires more VRAM than SDS due to higher resolution rendering. Decrease the resolution to fit to smaller GPUs.

**Results obtained in threestudio (Stable Diffusion, 512x512)**

<img alt="A_DSLR_photo_of_a_freshly_baked_round_loaf_of_sourdough_bread" src="https://github.com/user-attachments/assets/ec499869-502a-4bcc-b983-279643920b89" width="48%">
<img alt="a_photograph_of_a_knight" src="https://github.com/user-attachments/assets/71981e65-b8b5-4505-beab-41ef1cd545a9" width="48%">

**Example running commands**
```sh
python launch.py --config configs/sdi.yaml --train --gpu 0 system.prompt_processor.prompt="pumpkin head zombie, skinny, highly detailed, photorealistic"

python launch.py --config configs/sdi.yaml --train --gpu 1 system.prompt_processor.prompt="a photograph of a ninja"

python launch.py --config configs/sdi.yaml --train --gpu 2 system.prompt_processor.prompt="a zoomed out DSLR photo of a hamburger"

python launch.py --config configs/sdi.yaml --train --gpu 3 system.prompt_processor.prompt="bagel filled with cream cheese and lox"
```

### ProlificDreamer [![arXiv](https://img.shields.io/badge/arXiv-2305.16213-b31b1b.svg?style=flat-square)](https://arxiv.org/abs/2305.16213)

**This is an unofficial experimental implementation! Please refer to [https://github.com/thu-ml/prolificdreamer](https://github.com/thu-ml/prolificdreamer) for official code release.**
Expand Down
120 changes: 120 additions & 0 deletions configs/sdi.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
name: "score-distillation-via-inversion" # https://arxiv.org/abs/2405.15891
tag: "${rmspace:${system.prompt_processor.prompt},_}"
exp_root_dir: "outputs"
seed: 0

data_type: "random-camera-datamodule"
data:
batch_size: 1
width: 512
height: 512
camera_distance_range: [1.5, 2.0]
fovy_range: [40, 70]
elevation_range: [-10, 45]
light_sample_strategy: "dreamfusion"
eval_camera_distance: 2.0
eval_fovy_deg: 70.

system_type: "sdi-system"
system:
geometry_type: "implicit-volume"
geometry:
radius: 2.0
normal_type: "analytic"

# use Magic3D density initialization
density_bias: "blob_magic3d"
density_activation: softplus
density_blob_scale: 10.
density_blob_std: 0.5

# coarse to fine hash grid encoding
# to ensure smooth analytic normals
pos_encoding_config:
otype: ProgressiveBandHashGrid
n_levels: 16
n_features_per_level: 2
log2_hashmap_size: 19
base_resolution: 16
per_level_scale: 1.447269237440378 # max resolution 4096
start_level: 8 # resolution ~200
start_step: 2000
update_steps: 500

material_type: "diffuse-with-point-light-material"
material:
ambient_only_steps: 1000
albedo_activation: sigmoid
diffuse_prob: 0.3
textureless_prob: 0.75
ambient_only_on_test: true

background_type: "neural-environment-map-background"
background:
color_activation: sigmoid

renderer_type: "nerf-volume-renderer"
renderer:
radius: ${system.geometry.radius}
num_samples_per_ray: 512
return_comp_normal: true

prompt_processor_type: "stable-diffusion-prompt-processor"
prompt_processor:
pretrained_model_name_or_path: "stabilityai/stable-diffusion-2-1-base"
prompt: ???
use_perp_neg: true

guidance_type: "stable-diffusion-sdi-guidance"
guidance:
pretrained_model_name_or_path: "stabilityai/stable-diffusion-2-1-base"
guidance_scale: 7.5
weighting_strategy: sds
min_step_percent: 0.25
max_step_percent: 0.98

# SDI parameters
enable_sdi: true
inversion_guidance_scale: -7.5
inversion_n_steps: 10
inversion_eta: 0.3
t_anneal: true

loggers:
wandb:
enable: false
project: "threestudio"
name: None

loss:
lambda_sdi: 1.
lambda_orient: 0.1
lambda_sparsity: [0,0.15,0.,3000]
lambda_opaque: 0.1
lambda_convex: [0,1.,0.1,4000]
lambda_z_variance: 1.

optimizer:
name: Adam
args:
lr: 0.01
betas: [0.9, 0.99]
eps: 1.e-15
params:
geometry:
lr: 0.01
background:
lr: 0.001

trainer:
max_steps: 10000
log_every_n_steps: 1
num_sanity_val_steps: 0
val_check_interval: 50
enable_progress_bar: true
precision: 16-mixed

checkpoint:
save_last: true # save at each validation time
save_top_k: -1
every_n_train_steps: ${trainer.max_steps}
2 changes: 2 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ wandb
gradio==4.11.0
git+https://github.com/ashawkey/envlight.git
torchmetrics
IPython
ipywidgets

# deepfloyd
xformers
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

setup(
name="threestudio",
version='"0.2.3"', # the current version of your package
version="0.2.3", # the current version of your package
packages=find_packages(), # automatically discover all packages and subpackages
url="https://github.com/threestudio-project/threestudio", # replace with the URL of your project
author="Yuan-Chen Guo and Ruizhi Shao and Ying-Tian Liu and Christian Laforte and Vikram Voleti and Guan Luo and Chia-Hao Chen and Zi-Xin Zou and Chen Wang and Yan-Pei Cao and Song-Hai Zhang", # replace with your name
Expand Down
1 change: 1 addition & 0 deletions threestudio/models/guidance/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
deep_floyd_guidance,
instructpix2pix_guidance,
stable_diffusion_guidance,
stable_diffusion_sdi_guidance,
stable_diffusion_unified_guidance,
stable_diffusion_vsd_guidance,
stable_zero123_guidance,
Expand Down
Loading

0 comments on commit 915b82d

Please sign in to comment.