Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MVP Batch renderer integration WIP #863

Conversation

eundersander
Copy link
Contributor

@eundersander eundersander commented May 4, 2022

Motivation and Context

"Batch rendering" means rendering (drawing) multiple environments at the same time. The purpose is to speed up rendering (and training).

See also the Hab sim PR that implements ReplayBatchRenderer and related python: facebookresearch/habitat-sim#1745 .

This PR integrates a very simple batch renderer that just reuses our existing Hab 2.0 renderer. How it works:

  • In the main process, in ReplayBatchRenderer (c++), we start with empty copies of all environments' visual SceneGraphs.
  • Again in the main process, in batch_renderer.py, we re-create the same visual sensors used by the environments.
  • On each step/reset, for each environment worker, a gfx-replay keyframe is passed from the worker process to the main python process via an observation ("sim_blob", a python string). Meanwhile, we take care to skip all render-asset-loading and observation-drawing by the workers (e.g. create_renderer=False).
  • Then, in batch_renderer.py, we gather all keyframes and restore all visual SceneGraphs (using built-in gfx-replay functionality, which includes loading render assets).
  • Finally, we draw all sensors serially (using the existing Hab 2.0 renderer codepath) and update all observations dictionaries just before returning them to PPOTrainer.

Some notable architectural decisions here:

  • We keep our existing paradigm of a single python worker per environment.
    • Serve as a "drop-in" upgrade for existing Hab 2.0 users (toggled with a single yaml config boolean).
    • Save GPU memory by avoiding duplication of render assets (e.g. textures) across environments.
    • Perhaps get some kind of training speedup.
      • Considering that Hab 2.0 training isn't really bottlenecked on rendering, this path is still unclear.
    • Alongside this approach, we'll also pursue more radical/invasive approaches like Galakhtic that yield a bigger speedup.
  • This is a kind of "stub" batch renderer built from our existing Hab 2.0 renderer.
    • A "real" batch renderer uses newer Vulkan and OpenGL APIs, e.g. multidraw, to hugely improve rendering speed.
    • We can swap in the upcoming Magnum batch renderer while re-using much of the framework here.
  • We use gfx-replay to pass environment state from the workers to the main process.
    • We can swap this out in the future, e.g. pass a "sim state" or a Magnum render-command list.

This is a draft PR meant to spur architecture discussion. It doesn't quite work yet; there's a bit more python bookkeeping to do. This PR would eventually merge into main, not hab_suite_baselines.

How Has This Been Tested

WIP. I'm testing with Hab 2.0 rearrangement training. See also the c++ integration test here.

Types of changes

  • New feature (non-breaking change which adds functionality)

Checklist

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have completed my CLA (see CONTRIBUTING)
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label May 4, 2022
@eundersander eundersander added the do not merge Not ready to merge. This label should block merging. label May 4, 2022
if self.habitat_config.HABITAT_SIM_V0.get(
"ENABLE_GFX_REPLAY_SAVE", False
):
# sloppy: don't currently support both batch-render and save_keyframe
assert not self.habitat_config.BATCH_RENDER
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asserts can be disabled in python for increased speed. Is this a sanity check or should it raise an error? If so, raise a NotImplementedError or something likewise informative.

@0mdc 0mdc self-assigned this Oct 26, 2022
@eundersander eundersander deleted the batch_renderer branch April 6, 2023 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Do not delete this pull request or issue due to inactivity. do not merge Not ready to merge. This label should block merging.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants