The Scene Language: Representing Scenes with Programs, Words, and Embeddings

Yunzhi Zhang, Zizhang Li, Matt Zhou, Shangzhe Wu, Jiajun Wu. arXiv preprint 2024.

Installation

Environment

conda create --name sclg python=3.11
conda activate sclg
pip install mitsuba 
# if you run into segmentation fault, you might need specific mitsuba versions
# e.g., `pip install --force-reinstall mitsuba==3.5.1` on MacOS
pip install unidecode Pillow anthropic transforms3d astor ipdb scipy jaxtyping imageio

# required for minecraft renderer
pip install spacy
python -m spacy download en_core_web_md

pip install --force-reinstall numpy==1.26.4  # to be compatible with transforms3d

git clone https://github.com/zzyunzhi/scene-language.git
cd scene-language
pip install -e .

Language Model API

Get your Anthropic API key following the official documentation and add it to engine/key.py:

ANTHROPIC_API_KEY = 'YOUR_ANTHROPIC_API_KEY'
OPENAI_API_KEY = 'YOUR_OPENAI_API_KEY'  # optional, required for `LLM_PROVIDER='gpt'`

By default, we use Claude 3.5 Sonnet. You may switch to other language models by setting LLM_PROVIDER in engine/constants.py.

Text-Conditioned 3D Generation

Renderer: Mitsuba

python scripts/run.py --tasks "a chessboard with a full set of chess pieces"

Renderings will be saved to ${PROJ_ROOT}/scripts/outputs/run_${timestep}_${uuid}/${scene_name}_${uuid}/${sample_index}/renderings/*.gif.

Example results (raw outputs here):

"a chessboard with a full set of chess pieces"	"A 9x9 Sudoku board partially filled with numbers"	"a scene inspired by Egon Schiele"	"a Roman Colosseum"	"a spider puppet"

Renderer: Minecraft

ENGINE_MODE=minecraft python scripts/run.py --tasks "a detailed cylindrical medieval tower"

Generated scenes are saved as json files in ${PROJ_ROOT}/scripts/outputs/run_${timestep}_${uuid}/${scene_name}_${uuid}/${sample_index}/renderings/*.json. For visualization, run the following command:

python viewers/minecraft/run.py

Then open http://127.0.0.1:5001 in your browser and drag generated json files to the web page.

Example results (raw outputs here):

"a witch's house in Halloween"	"a detailed cylindrical medieval tower"	"a detailed model of Picachu"	"Stonehenge"	"a Greek temple"

Image-Conditioned 3D Generation

python scripts/run.py --tasks ./resources/examples/* --cond image --temperature 0.8

Codebase Details

Macro definitions

The following table lists helper functions defined in this file in accordance with expressions defined in the domain-specific language (DSL) (Tables 2 and 5 of the paper):

Implementation	DSL
`register`	`bind`
`library_call`	`call`
`primitive_call`	`call`
`loop`	`union-loop`
`concat_shapes`	`union`
`transform_shape`	`transform`
`rotation_matrix`	`rotation`
`translation_matrix`	`translate`
`scale_matrix`	`scale`
`reflection_matrix`	`reflect`
`compute_shape_center`	`compute-shape-center`
`compute_shape_min`	`compute-shape-min`
`compute_shape_max`	`compute-shape-max`
`compute_shape_sizes`	`compute-shape-sizes`

Codebase improvements

The current codebase allows you to generate 3D scenes with text or image prompts. Other tasks and renderers reported in the paper will be supported in future updates.

Please submit a PR or email us if you have feature requests, suggestions for improvements, or would like to share your results.

Citation

If you find this work useful, please consider cite our paper:

@article{zhang2024scenelanguage,
  title={The Scene Language: Representing Scenes with Programs, Words, and Embeddings},
  author={Yunzhi Zhang and Zizhang Li and Matt Zhou and Shangzhe Wu and Jiajun Wu},
  year={2024},
  journal={arXiv preprint arXiv:2410.16770},
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
engine		engine
resources		resources
scripts		scripts
viewers/minecraft		viewers/minecraft
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Scene Language: Representing Scenes with Programs, Words, and Embeddings

Installation

Environment

Language Model API

Text-Conditioned 3D Generation

Renderer: Mitsuba

Renderer: Minecraft

Image-Conditioned 3D Generation

Codebase Details

Codebase improvements

Citation

About

Releases

Packages

Languages

License

zzyunzhi/scene-language

Folders and files

Latest commit

History

Repository files navigation

The Scene Language: Representing Scenes with Programs, Words, and Embeddings

Installation

Environment

Language Model API

Text-Conditioned 3D Generation

Renderer: Mitsuba

Renderer: Minecraft

Image-Conditioned 3D Generation

Codebase Details

Codebase improvements

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages