TL;DR: Official repo of SemanticSDS. By leveraging program-aided layout planning, augmenting 3D Gaussians with semantic embeddings, and guiding SDS with rendered semantic maps, SemanticSDS unlocks the compositional capabilities of pre-trained diffusion models, generating complex 3D scenes comprising multiple objects with various attributes.
Click for full abstract
Generating high-quality 3D assets from textual descriptions remains a pivotal challenge in computer graphics and vision research. Due to the scarcity of 3D data, state-of-the-art approaches utilize pre-trained 2D diffusion priors, optimized through Score Distillation Sampling (SDS). Despite progress, crafting complex 3D scenes featuring multiple objects or intricate interactions is still difficult. To tackle this, recent methods have incorporated box or layout guidance. However, these layout-guided compositional methods often struggle to provide fine-grained control, as they are generally coarse and lack expressiveness. To overcome these challenges, we introduce a novel SDS approach, Semantic Score Distillation Sampling (SemanticSDS), designed to effectively improve the expressiveness and accuracy of compositional text-to-3D generation. Our approach integrates new semantic embeddings that maintain consistency across different rendering views and clearly differentiate between various objects and parts. These embeddings are transformed into a semantic map, which directs a region-specific SDS process, enabling precise optimization and compositional generation. By leveraging explicit semantic guidance, our method unlocks the compositional capabilities of existing pre-trained diffusion models, thereby achieving superior quality in 3D content generation, particularly for complex objects and scenes. Experimental results demonstrate that our SemanticSDS framework is highly effective for generating state-of-the-art complex 3D content.rabbit.mp4
A rabbit sits atop a large, expensive watch with many shiny gears, made half of iron and half of gold, eating a birthday cake that is in front of the rabbit.
corgi_car_house.mp4
A corgi is positioned to the left of a LEGO house, while a car with its front half made of cheese and its rear half made of sushi is situated to the right of the house made of LEGO.
hamburger.mp4
A hamburger, a loaf of bread, an order of fries, and a cup of Coke.
cookie.mp4
A cozy scene with a plush triceratops toy surrounded by a plate of chocolate chip cookies, a glistening cinnamon roll, and a flaky croissant.
More results
mannequin.mp4
A mannequin adorned with a dress made of feathers and moss stands at the center, flanked by a vase with a single blue tulip and another with blue roses.
pyramid.mp4
A pyramid-shaped burrito artistically blended with the Great Pyramid.
train.mp4
A train with a front made of cake and a back of a steam engine.
-
Install the requirements:
It is recommended to use CUDA 11.8, but other versions should also work fine.
conda create -n semanticSDS python=3.9 conda activate semanticSDS pip install -r requirements.txt
To install PyTorch3D, follow the instructions from PyTorch3D's official installation guide. Here is a quick rundown to install it from a stable branch with CUDA support:
git clone --branch stable https://github.com/facebookresearch/pytorch3d.git cd pytorch3d FORCE_CUDA=1 pip install .
(Optional, Recommended) Install ninja to speed up the compilation of CUDA extensions:
pip install ninja
-
Build the extension for Gaussian Splatting:
cd gs ./build.sh
-
Start! Run the main program using the following command. Make sure to specify the appropriate config name:
python main.py --config-name=pyramid
Add your OpenAI API key in the generate_layouts_PAL.py file at the location marked with a # openai token
comment.
To generate a layout for a single user prompt, use the following command:
python generate_layouts_PAL.py --template_version "v0.14_PAL_updated_incontext" --llm_name "gpt-4-32k" --user_prompt "A corgi is situated to the left of a house, while a car is positioned to the right of the house. The car above is split into two layers along the depth axis. The front layer of the car is constructed from wood. The left half of the rear layer is made of sushi, and the right half is made of cheese."
You can view the results in the layouts_cache folder.
For processing multiple prompts, you can use a batch file containing the prompts. Use the following command, replacing ./prompts_to_ask.txt
with the path to your text file:
python generate_layouts_PAL.py --template_version "v0.14_PAL_updated_incontext" --llm_name "gpt-4-32k" --batch_prompt_file "./prompts_to_ask.txt"
If you are not satisfied with the responses obtained before, add --disable_response_cache
to your command.
-
Configuration: Edit the
.yaml
files in theconf
folder:- Set
llm_init.enabled
totrue
. - Update
json_path
to point to thenormalized.json
file for the corresponding layouts:
llm_init: enabled: true json_path: layouts_cache/v0.14_PAL_updated_incontext/gpt-4-32k/mannequin/normalized.json
- Set
-
Execute: Run the main program with the appropriate config name:
python main.py --config-name=mannequin
-
To resolve
ImportError: libXrender.so.1: cannot open shared object file: No such file or directory
, try:apt-get update && apt-get install libxrender1
-
To resume from a checkpoint:
- Add
+ckpt=<path_to_your_ckpt>
to the run command, where<path_to_your_ckpt>
is the actual path to your.pt
checkpoint file. - Ensure that the
.yaml
configuration file you're using is the same as the one used to save the checkpoint.
Example command to resume from a checkpoint:
python main.py --config-name=car +ckpt="checkpoints/a_dslr_photo_of_a_car_made_out_of_lego/2024-10-13/080848/ckpts/step_2000.pt"
- Add
-
Enable wandb: To monitor and log your runs using Weights & Biases (wandb.ai), add
wandb=true
to the run command.
Thanks to the awesome open-source projects of GSGEN for their outstanding work, which have significantly contributed to this codebase.
@article{yang2024semanticsds,
title={Semantic Score Distillation Sampling for Compositional Text-to-3D Generation},
author={Yang, Ling and Zhang, Zixiang and Han, Junlin and Zeng, Bohan and Li, Runjia and Torr, Philip and Zhang, Wentao},
journal={arXiv preprint arXiv:2410.09009},
year={2024}
}