Merge pull request #508 from ottogin/threestudio-integration

Implementation of Score Distillation via Inversion
threestudio-project · Nov 28, 2024 · 915b82d · 915b82d
2 parents 8c8a480 + 3f88b3f
commit 915b82d
Show file tree

Hide file tree

Showing 10 changed files with 1,670 additions and 7 deletions.
diff --git a/2dplayground_SDI_version.ipynb b/2dplayground_SDI_version.ipynb
diff --git a/README.md b/README.md
@@ -13,12 +13,14 @@ threestudio is a unified framework for 3D content creation from text prompts, si
 <br/>
 <img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/19284678/01a00207-3240-4a8e-aa6f-d48436370fe7.png" width="100%">
 <br/>
-<img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/19284678/1dbdebab-43d5-4830-872c-66b38d9fda92" width="60%">
-<img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/24589363/d746b874-d82f-4977-a549-98d9ba764dfc" width="30%">
+<img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/19284678/1dbdebab-43d5-4830-872c-66b38d9fda92" width="48%">
+<img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/24589363/d746b874-d82f-4977-a549-98d9ba764dfc" width="25%">
+<img alt="threestudio" src="https://github.com/user-attachments/assets/afcf74ee-85ff-4792-b109-191f54b44edd" width="24%">
 
 <br/>
-<img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/19284678/437b4044-142c-4e5d-a406-4d9bad0205e1" width="60%">
-<img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/24589363/812741c0-7229-412e-b6ab-81e377890f04" width="30%">
+<img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/19284678/437b4044-142c-4e5d-a406-4d9bad0205e1" width="48%">
+<img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/24589363/812741c0-7229-412e-b6ab-81e377890f04" width="25%">
+<img alt="threestudio" src="https://github.com/user-attachments/assets/c0858bc5-6b9d-446a-b5df-76534c8a3072" width="25%">
 
 <br/>
 <img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/19284678/4f4d62c5-2304-4e20-b632-afe6d144a203" width="68%">
@@ -31,7 +33,7 @@ threestudio is a unified framework for 3D content creation from text prompts, si
 👆 Results obtained from methods implemented by threestudio 👆 <br/>
 | <a href="https://ml.cs.tsinghua.edu.cn/prolificdreamer/">ProlificDreamer</a> | <a href="https://dreamfusion3d.github.io/">DreamFusion</a> | <a href="https://research.nvidia.com/labs/dir/magic3d/">Magic3D</a> | <a href="https://pals.ttic.edu/p/score-jacobian-chaining">SJC</a> | <a href="https://github.com/eladrich/latent-nerf">Latent-NeRF</a> | <a href="https://fantasia3d.github.io/">Fantasia3D</a> | <a href="https://fabi92.github.io/textmesh/">TextMesh</a> |
 <br/>
-| <a href="https://zero123.cs.columbia.edu/">Zero-1-to-3</a> | <a href="https://guochengqian.github.io/project/magic123/">Magic123</a> | <a href="https://github.com/JunzheJosephZhu/HiFA">HiFA</a> |
+| <a href="https://zero123.cs.columbia.edu/">Zero-1-to-3</a> | <a href="https://guochengqian.github.io/project/magic123/">Magic123</a> | <a href="https://github.com/JunzheJosephZhu/HiFA">HiFA</a> | <a href="https://lukoianov.com/sdi">SDI</a> |
 <br />
 | <a href="https://instruct-nerf2nerf.github.io/">InstructNeRF2NeRF</a> | <a href="https://control4darxiv.github.io/">Control4D</a> |
 </b>
@@ -68,6 +70,7 @@ threestudio is a unified framework for 3D content creation from text prompts, si
 </b>
 
 ## News
+- 08/11/2024: Thank [Artem Lukoianov](https://github.com/ottogin) for implementation of [Score Distillation via Reparametrized DDIM](https://lukoianov.com/sdi)! Text-to-3D module is added to Threestudio as well as a notebook with 2D score distillation experiments.
 - 21/10/2024: Thank [Amir Barda](https://github.com/amirbarda) for implementation of [MagicClay](https://github.com/amirbarda/MagicClay)! Follow the instructions on its website to give it a try.
 - 12/03/2024: Thank [Matthew Kwak](https://github.com/mskwak01) and [Inès Hyeonsu Kim](https://github.com/Ines-Hyeonsu-Kim) for implementation of [3DFuse](https://github.com/KU-CVLAB/3DFuse-threestudio)! Follow the instructions on its website to give it a try.
 - 08/03/2024: Thank [Xinhua Cheng](https://github.com/cxh0519/) for implementation of [GaussianDreamer](https://github.com/cxh0519/threestudio-gaussiandreamer)! Follow the instructions on its website to give it a try.
@@ -241,6 +244,36 @@ For feature requests, bug reports, or discussions about technical problems, plea
 
 ## Supported Models
 
+### Score Distillation via Reparametrized DDIM (SDI) [![arXiv](https://img.shields.io/badge/arXiv-2405.15891-b31b1b.svg?style=flat-square)](https://arxiv.org/abs/2405.15891)
+
+SDI suggests to reconsider the approach to sampling the noise term in Dreamfusion. The paper demonstrates that score distillation process can be seen as a reparametrization of 2D image sampling algorithms. In that case the noise added on each step of score distillation should be of a very particular form. Noise in Dreamfusion (SDS), however, is sampled randomly, what causes over-blurring. SDI approximates the correct noise term by inverting the DDIM process.
+
+Notable differences from the paper: N/A.
+
+Pros:
+* High quality of the textures
+* Sharp geometric details
+
+Cons:
+* Slower than SDS (1.5x times) due to additional inversion. Still faster then prolific dreamer due to lower number of steps
+* Requires more VRAM than SDS due to higher resolution rendering. Decrease the resolution to fit to smaller GPUs.
+
+**Results obtained in threestudio (Stable Diffusion, 512x512)**
+
+<img alt="A_DSLR_photo_of_a_freshly_baked_round_loaf_of_sourdough_bread" src="https://github.com/user-attachments/assets/ec499869-502a-4bcc-b983-279643920b89" width="48%">
+<img alt="a_photograph_of_a_knight" src="https://github.com/user-attachments/assets/71981e65-b8b5-4505-beab-41ef1cd545a9" width="48%">
+
+**Example running commands**
+```sh
+python launch.py --config configs/sdi.yaml --train --gpu 0 system.prompt_processor.prompt="pumpkin head zombie, skinny, highly detailed, photorealistic"
+
+python launch.py --config configs/sdi.yaml --train --gpu 1 system.prompt_processor.prompt="a photograph of a ninja"
+
+python launch.py --config configs/sdi.yaml --train --gpu 2 system.prompt_processor.prompt="a zoomed out DSLR photo of a hamburger"
+
+python launch.py --config configs/sdi.yaml --train --gpu 3 system.prompt_processor.prompt="bagel filled with cream cheese and lox"
+```
+
 ### ProlificDreamer [![arXiv](https://img.shields.io/badge/arXiv-2305.16213-b31b1b.svg?style=flat-square)](https://arxiv.org/abs/2305.16213)
 
 **This is an unofficial experimental implementation! Please refer to [https://github.com/thu-ml/prolificdreamer](https://github.com/thu-ml/prolificdreamer) for official code release.**

diff --git a/configs/sdi.yaml b/configs/sdi.yaml
@@ -0,0 +1,120 @@
+name: "score-distillation-via-inversion" # https://arxiv.org/abs/2405.15891
+tag: "${rmspace:${system.prompt_processor.prompt},_}"
+exp_root_dir: "outputs"
+seed: 0
+
+data_type: "random-camera-datamodule"
+data:
+  batch_size: 1
+  width: 512
+  height: 512
+  camera_distance_range: [1.5, 2.0]
+  fovy_range: [40, 70]
+  elevation_range: [-10, 45]
+  light_sample_strategy: "dreamfusion"
+  eval_camera_distance: 2.0
+  eval_fovy_deg: 70.
+
+system_type: "sdi-system"
+system:
+  geometry_type: "implicit-volume"
+  geometry:
+    radius: 2.0
+    normal_type: "analytic"
+
+    # use Magic3D density initialization
+    density_bias: "blob_magic3d"
+    density_activation: softplus
+    density_blob_scale: 10.
+    density_blob_std: 0.5
+
+    # coarse to fine hash grid encoding
+    # to ensure smooth analytic normals
+    pos_encoding_config:
+      otype: ProgressiveBandHashGrid
+      n_levels: 16
+      n_features_per_level: 2
+      log2_hashmap_size: 19
+      base_resolution: 16
+      per_level_scale: 1.447269237440378 # max resolution 4096
+      start_level: 8 # resolution ~200
+      start_step: 2000
+      update_steps: 500
+
+  material_type: "diffuse-with-point-light-material"
+  material:
+    ambient_only_steps: 1000
+    albedo_activation: sigmoid
+    diffuse_prob: 0.3
+    textureless_prob: 0.75
+    ambient_only_on_test: true
+
+  background_type: "neural-environment-map-background"
+  background:
+    color_activation: sigmoid
+
+  renderer_type: "nerf-volume-renderer"
+  renderer:
+    radius: ${system.geometry.radius}
+    num_samples_per_ray: 512
+    return_comp_normal: true
+
+  prompt_processor_type: "stable-diffusion-prompt-processor"
+  prompt_processor:
+    pretrained_model_name_or_path: "stabilityai/stable-diffusion-2-1-base"
+    prompt: ???
+    use_perp_neg: true
+
+  guidance_type: "stable-diffusion-sdi-guidance"
+  guidance:
+    pretrained_model_name_or_path: "stabilityai/stable-diffusion-2-1-base"
+    guidance_scale: 7.5
+    weighting_strategy: sds
+    min_step_percent: 0.25
+    max_step_percent: 0.98
+
+    # SDI parameters
+    enable_sdi: true
+    inversion_guidance_scale: -7.5
+    inversion_n_steps: 10
+    inversion_eta: 0.3
+    t_anneal: true
+
+  loggers:
+    wandb:
+      enable: false
+      project: "threestudio"
+      name: None
+
+  loss:
+    lambda_sdi: 1.
+    lambda_orient: 0.1
+    lambda_sparsity: [0,0.15,0.,3000]
+    lambda_opaque: 0.1
+    lambda_convex: [0,1.,0.1,4000]
+    lambda_z_variance: 1.
+
+  optimizer:
+    name: Adam
+    args:
+      lr: 0.01
+      betas: [0.9, 0.99]
+      eps: 1.e-15
+    params:
+      geometry:
+        lr: 0.01
+      background:
+        lr: 0.001
+
+trainer:
+  max_steps: 10000
+  log_every_n_steps: 1
+  num_sanity_val_steps: 0
+  val_check_interval: 50
+  enable_progress_bar: true
+  precision: 16-mixed
+
+checkpoint:
+  save_last: true # save at each validation time
+  save_top_k: -1
+  every_n_train_steps: ${trainer.max_steps}
diff --git a/requirements.txt b/requirements.txt
@@ -23,6 +23,8 @@ wandb
 gradio==4.11.0
 git+https://github.com/ashawkey/envlight.git
 torchmetrics
+IPython
+ipywidgets
 
 # deepfloyd
 xformers

diff --git a/setup.py b/setup.py
@@ -2,7 +2,7 @@
 
 setup(
     name="threestudio",
-    version='"0.2.3"',  # the current version of your package
+    version="0.2.3",  # the current version of your package
     packages=find_packages(),  # automatically discover all packages and subpackages
     url="https://github.com/threestudio-project/threestudio",  # replace with the URL of your project
     author="Yuan-Chen Guo and Ruizhi Shao and Ying-Tian Liu and Christian Laforte and Vikram Voleti and Guan Luo and Chia-Hao Chen and Zi-Xin Zou and Chen Wang and Yan-Pei Cao and Song-Hai Zhang",  # replace with your name

diff --git a/threestudio/models/guidance/__init__.py b/threestudio/models/guidance/__init__.py
@@ -3,6 +3,7 @@
     deep_floyd_guidance,
     instructpix2pix_guidance,
     stable_diffusion_guidance,
+    stable_diffusion_sdi_guidance,
     stable_diffusion_unified_guidance,
     stable_diffusion_vsd_guidance,
     stable_zero123_guidance,