-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move adding DynamicUniformIndex to Extract #5037
Move adding DynamicUniformIndex to Extract #5037
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this. One question - instead of having a default value of 0, wouldn’t it be better to make it an Option and skip drawing the thing if its index was never initialised?
This adds a branch in the middle of the render stage, which I'm hesitant to bloat even more given how heavy it already is, and it's assured to written to during prepare too. It also makes the component bigger, which deflates the performance gains we see here. Perhaps under a |
impl<C: Component> Clone for DynamicUniformIndex<C> { | ||
fn clone(&self) -> Self { | ||
Self { | ||
index: self.index, | ||
marker: PhantomData, | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the manual Clone
necessary here?
It doesn't relax the C: Component
bound and Copy
is still derived.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PhantomData<T>
only implements Clone
iff T: Clone
, which also transitively holds for the derived impl. This implements Clone
and Default
regardless of what T
is.
impl<C: Component> Default for DynamicUniformIndex<C> { | ||
fn default() -> Self { | ||
Self { | ||
index: 0, | ||
marker: PhantomData, | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This default impl also could be replaced by a derive.
Wanted to just note that if we merge #4902, we can avoid the secondary copy inside the extraction commands by adding a EDIT: Tried this, the I also tested to see if we could defer the |
Could you test and profile it to see if it does make a practical performance difference? It would be a win for correctness in case something doesn’t actually ever get set. |
This again becomes a question of whether the model matrix is used/desired to be used elsewhere in the render schedule such that not calculating it upfront incurs calculation of it multiple times. If it just moves time from one place to another with no other overall performance benefits then the only pro is that it makes the extract stage shorter. That would be enough but only if we think no one ever needs the model matrix. I wonder if TAA would need it for motion vectors or how that works… |
Tried this with a |
Good to know that panic incurs that performance hit. I was thinking that a missing index would somehow cause that entity not to be drawn by propagating up an error or something. But if we don’t already have error returns from draw functions then maybe it’s not worth it. I’m just kind of expecting it to be easy enough to make code where some entities never have their dynamic index updated and then they will be drawn using whatever the transform is for index 0. I suppose another way to handle it would be to make that model matrix produce vertices containing nans in the clip position and then it will be dropped, but that feels like a hack where it would be better to just not draw the thing. |
Having seen the perf hit, I tried the opposite and changed |
Sounds reasonable to do it in a separate PR. If you don't intend to do that straight away, could you add a TODO comment? |
render_device: Res<RenderDevice>, | ||
render_queue: Res<RenderQueue>, | ||
mut component_uniforms: ResMut<ComponentUniforms<C>>, | ||
components: Query<(Entity, &C)>, | ||
mut components: Query<(&C, &mut DynamicUniformIndex<C>)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this break the UniformComponentPlugin in the general case?
This now assumes that DynamicUniformIndex is added in the extract step, but that isn't the case for something using, say, ExtractComponentPlugin.
We aren't currently using this anywhere else, but given that this is intended to be a generalized (and user facing) abstraction, I think we should discuss ways to make this "fool proof".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've dropped the ball on this PR, but thinking on this a bit more. I think it makes a lot of sense to take an approach where we keep indices as components while directly writing extracted components to their target staging buffers.
Indices are small. 4-8 bytes typically. Compare this with the equivalent MeshUniform
, which is 132 bytes currently. If we are going to heavily leverage commands for rendering, we should be minimizing the number of large copies that are being performed. I'd much rather us copy heavy components once and then just shuffle the indices around.
If we still need the intermediate data during Prepare or Queue, we can always refer back to the buffer in memory. It's less ergonomic, but alleviates the heaviest parts of running the Render World right now.
Closing this as the renderer is already moving in a non-direct ECS storage direction, and the introduction of the instancing and batching changes makes this difficult to merge. |
Objective
prepare_uniform_components
's commands must be run with exclusive access to the render world and can take quite a bit of time for components on lots of entities, particularly with archetypes with many big components. This is doing redundant work that is already being done inExtract
.Solution
Default
onDynamicUniformIndex
.DynamicUniformIndex
into Extract instead ofPrepare
.prepare_uniform_components
to query for&mut DynamicUniformIndex
instead of using commands.Performance
This was tested against the
many_cubes
stress test. Here are the respective timing changes:Changelog
TODO
Migration Guide
TODO