Magnum batch renderer prototype #1798

mosra · 2022-06-30T16:32:34Z

Motivation and Context

Halfway through implementing multi-mesh and multi-file support for batch renderer I realized that having this on a WIP branch of a stale fork of a fork of this repository is starting to look more like a problem than a solution for a problem. So, in-line with the proposed integration path we discussed with @eundersander, I'm opening the first PR.

What this does

This PR integrates the core implementation of a Magnum-based batched renderer as a fully tested and maintained part of the main repository. At the moment it's not used by anything in the code but instead, to avoid code rot, its behavior is thoroughly tested, and so will be all future additions to it. Meaning, given the tests pass, my intent is to get it merged to main so #1745 as well as the Galakhtic project can depend on it.

The code itself is also only about 500 lines (the test being another 500), so it's not like the "dead code" should cause significant unwanted overhead. Majority of the actual work is done inside Magnum itself.

The feature set is currently as follows:

A narrow BatchRenderer interface that doesn't leak any GPU-API-specific functionality. Meaning it can have the full freedom in optimizing draws however it sees fit. As well as be implemented in either GL or Vulkan without the client-side code having to care.
A standalone BatchRendererStandalone implementation that manages also GL context and provides CUDA interop.
Designed with multi-draw and multi-scene behavior in mind. In particular, given a batch-optimized input file, it's capable of drawing a grid of dynamic scenes with a single draw call per scene. On the other hand, it's still possible to perform drawing of just a single scene mesh-by-mesh, but it won't be any faster than the current renderer.
Web-compatible renderer implementation. While I could design it with only advanced NVidia-only (Vulkan) features in mind, I felt that being compatible with VR devices and the web is an important property. Thus the multi-draw is implemented in a way that can run on WebGL as well.
Ability to configure a grid of scenes, with each having an independently controlled camera and a set of objects added.
Adding (trees of) named objects from a batch-optimized input file.
Flat-shaded triangle meshes, color and depth output.

The test file contains code that builds a batch-optimized glTF from scratch. The GltfSceneConverter plugin is currently in a WIP branch of Magnum, and thus the code is disabled if not building against said branch. Instead, to be still able to actually test the renderer, the test glTF file is bundled in the repository.

What this doesn't have

My goal here is to submit reasonably large chunks of code, to gather feedback on the overall direction and whether it all makes sense together. Thus, to not overwhelm, this is just the base implementation first. Other features will be in followup PRs:

Ability to remove arbitrary objects. At the moment it's only possible to clear a particular scene in the grid as a whole.
Ability to render multiple meshes from multiple files. At the moment it's only a single input file and a single draw call per scene.
Ability to render wireframe meshes (for debug visualization).
Ability to have a secondary "debug camera", possibly to a differently sized framebuffer.
There's no documentation, because right now it would be just a long list of things it cannot do. It will appear once the above planned functionality is in.

What's currently only present in the Galakhtic fork is the following:

A "script" that converts a large bundle of datasets into the optimized glTF. Because it again depends on a Magnum branch, is quite use-case-specific and relies on hardcoded set of datasets, I don't think it should go here.
A demo / "stress test" application that uses the batch-optimized glTF created by the above script and populates a grid of 192 scenes.

I expect that the Galakhtic fork eventually synchronizes with Habitat's main, and will contain optional "extra" functionality on top for data conversion and optimization, relying on more bleeding-edge Magnum features. I.e., vanilla Habitat will be fully capable of rendering batch-optimized glTFs, but to produce them one will need to use a different branch.

How Has This Been Tested

There's a test file, and look at the pretty images!

mosra · 2022-06-30T19:43:13Z

Everything seems to build (even for WebGL, yay!) and the test passes including CUDA interop.

However on the CI I get a segfault on exit. Doesn't happen locally with my RTX eGPU nor with Mesa & Intel/AMD. Will try investigating with ASan tomorrow, hopefully I didn't hit any cursed driver path :] It was me being stupid, forgot to call cudaGraphicsUnregisterResource() and looks like the 470.* drivers the CI has didn't like it, while on 510.* it was no problem.

You can't. You're extremely unhelpful here.

No GL error anymore!!

Heh, it just worked, even though I never explicitly tested this.

eundersander

I left some comments. Looking good. I'm curious if/how we can keep this code Magnum-only (free of Habitat-specific stuff). See my comment below.

eundersander · 2022-06-30T19:35:36Z

src/esp/gfx/BatchRenderer.h

+      const Magnum::Vector2i& tileCount);
+
+  struct State;
+  Corrade::Containers::Pointer<State> state;


Can you explain this paradigm? You want to keep the state members out of the header to reduce compile dependencies and build times?

Yes, reducing build times. Here in case of the configuration class it's possibly taken a bit too far and wouldn't need to be there as the state is tiny, but I have this so ingrained in my muscle memory that I used it everywhere.

For the BatchRenderer classes I definitely want to keep it this way because otherwise the header gets way too heavy for the minimal interface it exposes, especially when the alternative Vulkan backend gets added. For the BatchRendererConfiguration I can remove that if you feel it's silly -- let me know.

(Related: it's been a while since I worked with the Habitat codebase last time, and I was surprised by how long everything now takes to build. I may open a few PRs soon to clean some of that up, because it shouldn't be that an incremental build of the gfx library takes more time than rebuilding all of Magnum. I'm not used to so much waiting 😅)

eundersander · 2022-06-30T19:40:57Z

src/esp/gfx/BatchRenderer.h

+  //  scene name for the whole scene??
+  // TODO document this may not return consecutive IDs in case the added name
+  //  is multiple meshes together
+  std::size_t add(Magnum::UnsignedInt sceneId,


add is maybe too concise? This is for adding a "render object instance" or whatever you want to call that. One day we may also have addLight, right?

Yep, that makes sense, especially since the API is not really typed to differentiate between lights, meshes or else.

I took this originally from the new SceneConverter APIs, where it's also just add(), be it a mesh, an image, a material or a scene. You can see that in the generateTestData() test case. I'm still a bit on the fence about it, whether that will bite back in the future, but there it's at least clearly distinguished by the argument types.

Renamed to addMeshHierarchy().

eundersander · 2022-06-30T19:42:38Z

src/esp/gfx/BatchRenderer.h

+                  const Magnum::Matrix4& transformation);
+  void clear(Magnum::UnsignedInt sceneId);
+
+  Magnum::Matrix4& camera(Magnum::UnsignedInt sceneId);


Just to be clear, the user can modify this any time, even immediately after calling draw? It's not like the driver would be in the middle of copying this memory to GPU after calling draw, right?

Good point, this needs to be documented well.

The draw() function does a synchronous copy of all the data to the driver (and then the GL driver probably does an async copy to the GPU or whatever else it needs to do), so when it returns, it's safe to start modifying everything again. The only place that touches data used by the GPU is the colorImage() / cudaColorBufferDevicePointer() APIs, but those implicitly stall until the GPU has finished rendering, so it's also safe.

eundersander · 2022-06-30T19:44:16Z

src/tests/BatchRendererTest.cpp

+
+  renderer.draw();
+  Mn::Image2D color = renderer.colorImage();
+  MAGNUM_VERIFY_NO_GL_ERROR();


In practice, do we expect user code to also do this check for GL errors?

No :) Error checking is expensive, potentially causing driver stalls, and if the code causes GL errors, it's broken.

But I customarily do this in all test code because it helps me catch errors easier. In this case this check uncovered a NVidia-specific bug causing wrong camera matrix to be used, which I subsequently fixed in faad5b2.

eundersander · 2022-06-30T19:47:41Z

src/cmake/dependencies.cmake

+  # GltfSceneConverter and KtxImageConverter needed by BatchRendererTest,
+  # GltfSceneConverter is not in magnum-plugins master yet
+  #set(MAGNUM_WITH_GLTFSCENECONVERTER ON CACHE BOOL "" FORCE)
+  set(MAGNUM_WITH_KTXIMAGECONVERTER ON CACHE BOOL "" FORCE)


We don't expect Habitat users to be doing KTX compression, except in offline tools. (I know we also imagine a load-time preprocessing use case that could invoke this, but I hope to discourage users from this mode of operation.)

Any concern about including this by default in the Habitat runtime, in terms of binary size bloat or whatever? An alternative would be a build switch that is off by default.

Er, actually, the reason I put it here is because find_package(MagnumPlugins OPTIONAL_COMPONENTS KtxImageConverter) causes a CMake error otherwise 🔥 Which I have to fix.

But this particular plugin (unlike e.g. Basis, or OpenEXR) is a tiny self-contained 400-line code with no external dependencies that builds fast, so it felt like a good enough tradeoff until I fix the CMake modules.

Will add a TODO there, at least.

TODO added.

eundersander · 2022-06-30T19:49:21Z

src/esp/gfx/BatchRenderer.cpp

+
+/* std::hash specialization to be able to use Corrade::String in unordered_map
+ */
+// TODO put directly into Corrade, it's duplicated ON SEVEN PLACES ALREADY!!!


This is now implemented directly in Corrade, will remove it from here in the next batch of updates.

eundersander · 2022-06-30T19:56:41Z

src/esp/gfx/BatchRenderer.cpp

+// This source code is licensed under the MIT license found in the
+// LICENSE file in the root directory of this source tree.
+
+#include "BatchRenderer.h"


I like how this class doesn't depend on any Habitat code. I'd like to keep it that way.

Have you considered putting this in Magnum instead of here?

If not 1, what about some code organization that discourages including Habitat code here? E.g. a new CMake library (instead of gfx) with a name that conveys "just Magnum code" and doesn't link against Habitat libs.

No, the use case is too narrow and restricted to be useful for any other Magnum users except Habitat itself. Plus, as it's quite high-level compared to for example the multi-draw-enabled shaders, having it here means faster iteration times and easier experiments and perf tuning than the complex cycle of implementing, testing, documenting and upstreaming a feature in Magnum and then using it here. That said, what's here will tend to get smaller over time, as I integrate the lower-level bits of it into Magnum -- such as hierarchic transformation calculation, builtin mesh view support and such.

I wouldn't discourage that, of course in moderation, not like pulling the whole Habitat JSON config parsing in here :) But -- as we discussed several times before -- I can imagine the Habitat "client" and the renderer sharing certain data such as bounding volume annotations for both culling and physics or debug visualizers, and then this would pull in common helpers to retrieve such data consistently on both sides. Such data could also be duplicated in Habitat-specific structures, but to me that feels like wasteful extra work if it would only be to have the renderer not rely on any Habitat code.

Thinking further, I intend to make this batch renderer replace / take over everything that's in the gfx library, and then it'll be as you say -- a standalone "just Magnum code" library that shares very little with the rest of Habitat.

What I'm specifically trying to avoid/discourage: some hacky Habitat-specific renderer "features" like toggling lights, debug-drawing, etc. We want this to be reusable with other simulators in the future. I guess we can leave this here in gfx and then just try to be vigilant.

I think both toggling lights and debug drawing is a general-enough feature that would cause no problem here. If it's made in a way that doesn't negatively affect perf in scenarios that don't need them, of course.

What I would see highly problematic is dragging Habitat config JSON parsing into here, for example. But as long as I'm in charge of this code, that won't happen ;)

Agree about no Habitat config JSON parsing here.

But poking this a bit more with a concrete example: Imagine we call LightSetup.h getDefaultLights in this file. LightSetup.h includes SceneGraph.h, which includes esp/sensor/VisualSensor.h and SceneNode.h. At this point, all this Habitat code is now hopelessly intertwined and it would become hard to use this batch renderer with another simulator (I'm thinking upcoming batch Mujoco specifically; I recently discussed this with the Google team maintaining Mujoco in an email thread.).

So how should all this actually work? LightSetup or other Habitat code should call into this file, e.g. calling BatchRenderer::addLight.

This is my motivation to separate this from gfx lib: to cut off access to those headers and thus force this one-directional dependency relationship.

Ah, now I see what you mean ... especially given the current state of "everything includes everything else" 😅

Yep, making it a separate library would cut off the access, at least until someone needs to do a permanent temporary hack that links everything else to it again. But then a cunterexample -- wouldn't it make sense to eventually use Habitat's own logging routines in here? For Python-aware assertions, filtering and all that. Which is in the core library, which however unfortunately somehow pulls in the JSON headers as well, as far as I can see, and we're back at the start :/ I'm afraid the only viable long-term solution is to start enforcing header/dependency hygiene across the whole project...

But you have a point, and this is an opportunity to make the dependencies clean from the start. Unlike the rest of gfx I don't need any physics or Eigen integration here, so should I put this into a new batched_gfx library instead? Ideas for a fancier / shorter name?

But then a counterexample -- wouldn't it make sense to eventually use Habitat's own logging routines in here?

Agree this is tough. When we get to that point (adding logging), we can either (1) add the dependency on core and hope for the best, (2) do non-Habitat logging in here (to keep this truly isolated), or (3) clean up and separate core into more reasonable pieces, and only depend on the minimal set of pieces.

at least until someone needs to do a permanent temporary hack that links everything else to it again

Haha, exactly. We should have a giant scary comment in the CMake file about not adding dependencies. Maybe there's even a clever unit test that can be written (out of scope for this PR).

eundersander · 2022-06-30T20:04:37Z

src/esp/gfx/BatchRenderer.h

+  //  is multiple meshes together
+  std::size_t add(Magnum::UnsignedInt sceneId,
+                  Corrade::Containers::StringView name);
+  std::size_t add(Magnum::UnsignedInt sceneId,


We've also used the term "node template", i.e. the name you give here is the name of a node in the composite gltf, and it may have a hierarchy with multiple meshes. What's the status here? Does this return an integer representing the entire instantiated node? And we use that integer to index into transformations below?

Yes, that's what this does. (The TODO comments may be confusing, there I was just noting down ideas for how to solve naming when multiple files get involved.)

As a proof, look at the meshHierarchy() test case -- it corresponds to this portion of the glTF file, and this is the rendered output:

What this doesn't currently handle is deeper hierarchies, as I felt that was non-essential if you have full control over the input data and don't need to modify transformations inside the hierarchy. But I expect this to become a use case when generic input files are allowed, and it's prepared for that.

eundersander · 2022-06-30T20:26:35Z

While I could design it with only advanced NVidia-only (Vulkan) features in mind, I felt that being compatible with VR devices and the web is an important property. Thus the multi-draw is implemented in a way that can run on WebGL as well.

Respectfully disagree! It seems to me our best chance at making this simple and ultra-fast is to not over-generalize to other platforms. We don't need a batch renderer for VR or web.

mosra · 2022-07-01T10:53:38Z

It seems to me our best chance at making this simple and ultra-fast is to not over-generalize to other platforms.

Don't worry, I picked the right tradeoffs ;) As far as I know, texture arrays are never slower than descriptor arrays (or however the feature is called, to be able to pick among a divergent set of textures in the shader), and it only puts certain requirements on input data. Which is a solved problem now (including the texture repeat we talked about, that'll be done by the shader), and is done in a preprocessing step that has to be done anyway to make the meshes fast to draw.

That's the core of the renderer design and the optimized data layout, which I want to have as compatible as possible. But on top of that there will be functionality like compute-based culling, GPU self-feeding, virtual memory, mesh shaders and whatnot, which will be opt-in on platforms that can do it. Platforms that can't will use just the base and thus will be slower.

We don't need a batch renderer for VR or web.

We do -- I don't want to keep maintaining the classic renderer once this gets feature parity with it, I really don't. Ending up in a state where we finally have a replacement renderer but can't use it on certain platforms because its core design didn't count on that would be bad for the project. That said, I'm not really "wasting time" explicitly testing the batch renderer for WebGL or anything, I only make sure to design in a way that doesn't make it impossible to run there.

Much clearer this way.

mosra · 2022-07-15T14:42:57Z

The remaining minor TODOs should be fixed now. @eundersander @aclegg3 let me know if there's anything else :)

aclegg3

Looks like a good WIP for this feature. Thanks for the clear comments and separation from the rest of the codebase. Nicely parceled.

Skylion007

Looks good to me.

eundersander · 2022-07-19T17:38:05Z

src/esp/gfx_batch/CMakeLists.txt

+  target_link_libraries(gfx_batch PUBLIC ${CUDART_LIBRARY})
+endif()
+
+target_link_libraries(


Is this where the naive Habitat programmer would add core and other stuff? Can we add a scary comment here to discourage this? You could even link to the relevant thread in the PR.

I'm actually using core right now, somehow -- see the TODO on line 13 😅

Not sure if it's a good idea to attempt to move the configure.h header out of core as part of this PR, risking breaking too many things and delaying the merge even further. And it also might be a bit controversial change, so I'll open a dedicated PR with just that.

(Note added.)

The library doesn't depend on anything else in Habitat, which makes it nicely standalone. Eventually there will be dependencies in the other direction, though.

mosra · 2022-07-19T17:53:30Z

Renamed, esp::gfx::BatchRenderer -> esp::gfx_batch::Renderer and similarly for the test and test files.

Will wait on the CIs to ensure I didn't forget anything, but apart from that I think it's ready to get merged.

mosra requested review from jturner65, aclegg3 and eundersander June 30, 2022 16:32

facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Jun 30, 2022

mosra force-pushed the magnum-batch-renderer branch 6 times, most recently from 7c465b0 to 06088f4 Compare June 30, 2022 19:32

mosra added 5 commits June 30, 2022 21:51

Magnum batch renderer prototype.

137a3ce

Clang Format, you really think you can format this better than me??

46c7f99

You can't. You're extremely unhelpful here.

Take NVidia 256-byte UBO alignment into account.

faad5b2

No GL error anymore!!

Test also CUDA interop.

8d27af9

Heh, it just worked, even though I never explicitly tested this.

Fix build with external Magnum.

63f4e24

mosra force-pushed the magnum-batch-renderer branch from 06088f4 to 63f4e24 Compare June 30, 2022 19:51

eundersander reviewed Jun 30, 2022

View reviewed changes

Fix crash on exit -- forgot to unregister CUDA resources.

221b6c5

mosra force-pushed the magnum-batch-renderer branch from c8aeff4 to d78b46b Compare July 15, 2022 10:42

mosra added 2 commits July 15, 2022 12:43

Add a CMake tech debt TODO.

e2d326e

gfx: rename BatchRenderer::add() to addMeshHierarchy().

2dcc8a9

Much clearer this way.

mosra force-pushed the magnum-batch-renderer branch from d78b46b to 2dcc8a9 Compare July 15, 2022 10:43

aclegg3 approved these changes Jul 19, 2022

View reviewed changes

Skylion007 approved these changes Jul 19, 2022

View reviewed changes

eundersander reviewed Jul 19, 2022

View reviewed changes

Move the batch renderer to a dedicated gfx_batch library.

fa53f05

The library doesn't depend on anything else in Habitat, which makes it nicely standalone. Eventually there will be dependencies in the other direction, though.

More robust CUDA device detection in the test.

200de31

mosra force-pushed the magnum-batch-renderer branch from eadc5a3 to 04ee8eb Compare July 19, 2022 17:45

Add a note about restrictions for gfx_batch dependencies.

5686884

mosra force-pushed the magnum-batch-renderer branch from 04ee8eb to 5686884 Compare July 19, 2022 17:47

mosra merged commit 0c896aa into main Jul 20, 2022

mosra deleted the magnum-batch-renderer branch July 20, 2022 17:00

mosra mentioned this pull request Sep 28, 2022

Multi-mesh, multi-texture and multi-file support in the batch renderer #1874

Merged

13 tasks

Magnum batch renderer prototype #1798

Magnum batch renderer prototype #1798

Conversation

mosra commented Jun 30, 2022 • edited Loading

Motivation and Context

What this does

What this doesn't have

How Has This Been Tested

mosra commented Jun 30, 2022 • edited Loading

eundersander left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mosra Jul 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eundersander Jul 19, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eundersander Jul 19, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eundersander commented Jun 30, 2022

mosra commented Jul 1, 2022

mosra commented Jul 15, 2022

aclegg3 left a comment

Choose a reason for hiding this comment

Skylion007 left a comment

Choose a reason for hiding this comment

eundersander Jul 19, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mosra commented Jul 19, 2022

mosra commented Jun 30, 2022 •

edited

Loading

mosra commented Jun 30, 2022 •

edited

Loading

mosra Jul 1, 2022 •

edited

Loading

eundersander Jul 19, 2022 •

edited

Loading

eundersander Jul 19, 2022 •

edited

Loading

eundersander Jul 19, 2022 •

edited

Loading