Yunzhi Zhang, Zizhang Li, Matt Zhou, Shangzhe Wu, Jiajun Wu. arXiv preprint 2024.
conda create --name sclg python=3.11
conda activate sclg
pip install mitsuba
# if you run into segmentation fault, you might need specific mitsuba versions
# e.g., `pip install --force-reinstall mitsuba==3.5.1` on MacOS
pip install unidecode Pillow anthropic transforms3d astor ipdb scipy jaxtyping imageio
# required for minecraft renderer
pip install spacy
python -m spacy download en_core_web_md
pip install --force-reinstall numpy==1.26.4 # to be compatible with transforms3d
git clone https://github.com/zzyunzhi/scene-language.git
cd scene-language
pip install -e .
Get your Anthropic API key following the official documentation
and add it to engine/key.py
:
ANTHROPIC_API_KEY = 'YOUR_ANTHROPIC_API_KEY'
OPENAI_API_KEY = 'YOUR_OPENAI_API_KEY' # optional, required for `LLM_PROVIDER='gpt'`
By default, we use Claude 3.5 Sonnet. You may switch to other language models by setting LLM_PROVIDER
in engine/constants.py
.
python scripts/run.py --tasks "a chessboard with a full set of chess pieces"
Renderings will be saved to ${PROJ_ROOT}/scripts/outputs/run_${timestep}_${uuid}/${scene_name}_${uuid}/${sample_index}/renderings/*.gif
.
Example results (raw outputs here):
"a chessboard with a full set of chess pieces" | "A 9x9 Sudoku board partially filled with numbers" | "a scene inspired by Egon Schiele" | "a Roman Colosseum" | "a spider puppet" |
---|---|---|---|---|
ENGINE_MODE=minecraft python scripts/run.py --tasks "a detailed cylindrical medieval tower"
Generated scenes are saved as json files in ${PROJ_ROOT}/scripts/outputs/run_${timestep}_${uuid}/${scene_name}_${uuid}/${sample_index}/renderings/*.json
.
For visualization, run the following command:
python viewers/minecraft/run.py
Then open http://127.0.0.1:5001 in your browser and drag generated json files to the web page.
Example results (raw outputs here):
"a witch's house in Halloween" | "a detailed cylindrical medieval tower" | "a detailed model of Picachu" | "Stonehenge" | "a Greek temple" |
---|---|---|---|---|
python scripts/run.py --tasks ./resources/examples/* --cond image --temperature 0.8
Macro definitions
The following table lists helper functions defined in this file in accordance with expressions defined in the domain-specific language (DSL) (Tables 2 and 5 of the paper):
Implementation | DSL |
---|---|
register |
bind |
library_call |
call |
primitive_call |
call |
loop |
union-loop |
concat_shapes |
union |
transform_shape |
transform |
rotation_matrix |
rotation |
translation_matrix |
translate |
scale_matrix |
scale |
reflection_matrix |
reflect |
compute_shape_center |
compute-shape-center |
compute_shape_min |
compute-shape-min |
compute_shape_max |
compute-shape-max |
compute_shape_sizes |
compute-shape-sizes |
The current codebase allows you to generate 3D scenes with text or image prompts. Other tasks and renderers reported in the paper will be supported in future updates.
Please submit a PR or email us if you have feature requests, suggestions for improvements, or would like to share your results.
If you find this work useful, please consider cite our paper:
@article{zhang2024scenelanguage,
title={The Scene Language: Representing Scenes with Programs, Words, and Embeddings},
author={Yunzhi Zhang and Zizhang Li and Matt Zhou and Shangzhe Wu and Jiajun Wu},
year={2024},
journal={arXiv preprint arXiv:2410.16770},
}