Skip to content

Latest commit

 

History

History
89 lines (55 loc) · 7.97 KB

File metadata and controls

89 lines (55 loc) · 7.97 KB

CLEVR Image Generation

Images are generated by using Blender to invoke the script render_images.py like this:

blender --background --python render_images.py -- [args]

Any arguments following the -- will be captured by render_images.py.

This command should be run from the image_generation directory, since by default the script will load resources from the data directory.

When rendering on cluster machines without audio drivers installed you may need to add the -noaudio flag to the Blender invocation like this:

blender --background -noaudio --python render_images.py -- [args]

You can also run render_images.py as a standalone script to view help on all command line flags like this:

python render_images.py --help

Setup

You will need to download and install Blender; code has been developed and tested using Blender version 2.78c but other versions may work as well.

Blender ships with its own version of Python 3.5, and it uses its bundled Python to execute scripts. You'll need to add this directory to the Python path of Blender's bundled Python with a command like this:

echo $PWD >> $BLENDER/$VERSION/python/lib/python3.5/site-packages/clevr.pth

where $BLENDER is the directory where Blender is installed and $VERSION is your Blender version; for example on OSX you might run:

echo $PWD >> /Applications/blender/blender.app/Contents/Resources/2.78/python/lib/python3.5/site-packages/clevr.pth

Rendering Overview

The file data/base_scene.blend contains a Blender scene used for the basis of all CLEVR images. This scene contains a ground plane, a camera, and several light sources. After loading the base scene, the positions of the camera and lights are randomly jittered (controlled with the --key_light_jitter, --fill_light_jitter, --back_light_jitter, and --camera_jitter flags).

After the base scene has been loaded, objects are placed one by one into the scene. The number of objects for each scene is a random integer between --min_objects (default 3) and --max_objects (default 10), and each object has a random shape, size, color, and material.

After placing all objects, we ensure that no objects are fully occluded; in particular each object must occupy at least 100 pixels in the rendered image (customizable using --min_pixels_per_object). To accomplish this, we assign each object a unique color and render a version of the scene with lighting and shading disabled, writing it to a temporary file; we can then count the number of pixels of each color in this pre-render to check the number of visible pixels for each object.

Each invocation of render_images.py will render --num_images images, and they will be numbered starting at --start_idx (default 0). Using non-default values for --start_idx allows you to distribute rendering across many workers and recombine their results later without filename conflicts.

Object Placement

Each object is positioned randomly, but before actually adding the object to the scene we ensure that its center is at least --min_dist units away from the centers of all other objects. We also ensure that between each pair of objects, the left/right and front/back distance along the ground plane is at least --margin units; this helps to minimize ambiguous spatial relationships. If after --max_retries attempts we are unable to find a suitable position for an object, then all objects are deleted and placed again from scratch.

Image Resolution

By default images are rendered at 320x240, but the resolution can be customized using the --height and --width flags.

GPU Acceleration

Rendering uses CPU by default, but if you have an NVIDIA GPU with CUDA installed then you can use the GPU to accelerate rendering by adding the flag --use_gpu 1. Blender also supports acceleration using OpenCL which allows the use of non-NVIDIA GPUs; however this is not currently supported by render_images.py.

Rendering Quality

You can control the quality of rendering with the --render_num_samples flag; using fewer samples will run more quickly but will result in grainy images. I've found that 64 samples is a good number to use for development; all released CLEVR images were rendered using 512 samples. The --render_min_bounces and --render_max_bounces control the number of bounces for transparent objects; I've found the default of 8 to work well for these options.

When rendering, Blender breaks up the output image into tiles and renders tiles sequentialy; the --render_tile_size flag controls the size of these tiles. This should not affect the output image, but may affect the speed at which it is rendered. For CPU rendering smaller tile sizes may be optimal, while for GPU rendering larger tiles may be faster.

With default settings, rendering a 320x240 image takes about 4 seconds on a Pascal Titan X. It's very likely that these rendering times could be drastically reduced by someone more familiar with Blender, but this rendering speed was acceptable for our purposes.

Saving Blender Scene Files

You can save a Blender .blend file for each rendered image by adding the flag --save_blendfiles 1. These files can be more than 5 MB each, so they are not saved by default.

Output Files

Rendered images are stored in the --output_image_dir directory, which is created if it does not exist. The filename of each rendered image is constructed from the --filename_prefix, the --split, and the image index.

A JSON file for each scene containing ground-truth object positions and attributes is saved in the --output_scene_dir directory, which is created if it does not exist. After all images are rendered the JSON files for each individual scene are combined into a single JSON file and written to --output_scene_file. This single file will also store the --split, --version (default 1.0), --license (default CC-BY 4.0), and --date (default today).

When rendering large numbers of images, I have sometimes experienced random Blender crashes; saving JSON files for each scene as they are rendered ensures that you do not lose information for scenes already rendered in the event of a crash.

If saving Blender scene files for each image (--save_blendfiles 1) then they are stored in the --output_blend_dir directory, which is created if it does not exist.

Object Properties

The file --properties_json file (default data/properties.json) defines the allowed shapes, sizes, colors, and materials used for objects, making it easy to extend CLEVR with new object properties.

Each shape (cube, sphere, cylinder) is stored in its own .blend file in the --shape_dir (default data/shapes); the file X.blend contains a single object named X centered at the origin with unit size. The shapes field of the JSON properties file maps human-readable shape names to .blend files in the --shape_dir.

The colors field of the JSON properties file maps human-readable color names to RGB values between 0 and 255; most of our colors are adapted from Wad's Optimum 16 Color Palette.

The sizes field of the JSON properties file maps human-readable size names to scaling factors used to scale the object models from the --shape_dir.

Each material is stored in its own .blend file in the --material_dir (default data/materials). The file X.blend should contain a single NodeTree item named X, and this NodeTree item must have a single Color input that accepts an RGBA value so that each material can be used with any color. The materials field of the JSON properties file maps human-readable material names to .blend files in the --material_dir.

Restricting Shape / Color Combinations

The optional --shape_color_combos_json flag can be used to restrict the colors of each shape. If provided, this should give a path to a JSON file mapping shape names to lists of allowed color names. This option can be used to render CLEVR-CoGenT images using the files data/CoGenT_A.json and data/CoGenT_B.json.