Tracing's overhead may be too high for profiling CPU-bound operations #4892

james7132 · 2022-06-01T20:33:19Z

Bevy version

0.7 and current head (cc4062e)

What you did

Attempted to profile many_foxes with trace_tracy enabled. Commented out spans for system_overhead and system_commands.

What went wrong

Profiled measurements for each stage show significant changes. Here's a comparison of the Update stage measurements:

There's only 10-20 systems in that stage, and the overhead scales with the number of systems. This overhead should be much lower, and it makes it difficult to make perf decisions on a macro level.

The text was updated successfully, but these errors were encountered:

hymm · 2022-06-01T20:41:38Z

I've been wanting to put certain sets of spans under different features for this reason. At the very least separate it into the system/overhead spans and everything else. We'd probably want to have them enabled under the trace feature, but be able to separate them out too.

hymm · 2022-06-02T04:02:35Z

I should mention that a big portion of the difference between the two histograms above is due to the applying system commands. This is mostly just a constant offset since most systems don't apply commands every frame. The issue is that it shows up in the trace as a bunch of spans.

This check is very light quick in actual code since it's a tight loop just checking that the buffers are empty. Without tracing enable applying these commands is probably less than a microsecond, but shows up in the above image as 90us.

I'm not sure how much of an issue this is as it shows up a constant offset for most frames, but we should probably supply a way of disabling these spans if people want to see something closer to the real framerate.

james7132 · 2022-06-02T08:52:03Z

For apply_buffers we could move it into the actual SystemParam apply_buffer implementations where it has non-zero work. Only downside is that zero/empty frames are not included in the statistics.

However, the other side in prepare systems is both a non-trivial real cost (scheduling onto the task pool) and a tight span cost.

One potential cause of this might be that the tracing macros allocate memory, or the subscriber/visitor model of tracing may not be a trivial operation, which have a non-trivial runtime cost. See nagisa/rust_tracy_client#26, which includes a tracy_client level non-allocating span! macro, only downside is that it only supports static strings.

hymm · 2022-06-02T17:28:02Z

For apply_buffers we could move it into the actual SystemParam apply_buffer implementations where it has non-zero work.

We'd have to get the name of the originating system down into the SystemParam somehow. Might be doable. People have wanted the name there for other reasons like error reporting. See #2949

non-allocating span! macro

Would we have to bypass the tracing crate to use this?

davidbarsky · 2022-06-07T16:44:09Z

Notes from Discord: the span macro does not allocate, which means that the overhead isn't tracing or tracing-core, but rather, tracing-subscriber. tracing-subscriber's Registry and Extensions (which are used by tracing-tracy) to store profiling have room to improve performance, but a decent amount of the overhead you're experiencing comes from tracing-subscriber::Registry trying to be modular and flexible in accepting data for arbitrary user-defined Layers.

For a profiling use-case, it might be worthwhile to implement your own subscriber (replacing the Registry but using the underlying sharded-slab data structure without boxing values into extensions.)

@hawkw has some ideas to improve tracing_subscriber::Registry, where instead of using a HashMap for extensions, Layer types would be required to pre-declare extension types they'd be storing in the Registry and use arrays to store extensions. However, this will be a tracing-subscriber 0.4 change, which is not currently scheduled.

# Objective A separate `tracing` span for running a system's commands is created, even if the system doesn't have commands. This is adding extra measuring overhead (see #4892) where it's not needed. ## Solution Move the span into `ParallelCommandState` and `CommandQueue`'s `SystemParamState::apply`. To get the right metadata for the span, a additional `&SystemMeta` parameter was added to `SystemParamState::apply`. --- ## Changelog Added: `SystemMeta::name` Changed: Systems without `Commands` and `ParallelCommands` will no longer show a "system_commands" span when profiling. Changed: `SystemParamState::apply` now takes a `&SystemMeta` parameter in addition to the provided `&mut World`.

# Objective A separate `tracing` span for running a system's commands is created, even if the system doesn't have commands. This is adding extra measuring overhead (see bevyengine#4892) where it's not needed. ## Solution Move the span into `ParallelCommandState` and `CommandQueue`'s `SystemParamState::apply`. To get the right metadata for the span, a additional `&SystemMeta` parameter was added to `SystemParamState::apply`. --- ## Changelog Added: `SystemMeta::name` Changed: Systems without `Commands` and `ParallelCommands` will no longer show a "system_commands" span when profiling. Changed: `SystemParamState::apply` now takes a `&SystemMeta` parameter in addition to the provided `&mut World`.

alice-i-cecile · 2023-09-13T19:10:52Z

I'm calling this fixed by #9390. If there's still ongoing problems, please reopen this issue :)

james7132 added C-Bug An unexpected or incorrect behavior A-Diagnostics Logging, crash handling, error reporting and performance analysis labels Jun 1, 2022

This was referenced Jun 11, 2022

Low/Minimal Allocation version of tracing_tracy nagisa/rust_tracy_client#42

Closed

[Merged by Bors] - bevy_log: upgrade to tracing-tracy 0.10.0 #4991

Closed

This was referenced Jul 19, 2022

internal: Simplify logger rust-lang/rust-analyzer#12631

Closed

subscriber: rewrite Registry to use static arrays tokio-rs/tracing#2230

Open

james7132 mentioned this issue Dec 9, 2022

[Merged by Bors] - Move system_commands spans into apply_buffers #6900

Closed

james7132 mentioned this issue Feb 17, 2023

Opprotunistically prune run condition spans #7729

Open

hymm mentioned this issue Aug 9, 2023

Cache System Tracing Spans #9390

Merged

alice-i-cecile closed this as completed Sep 13, 2023

james7132 mentioned this issue Jan 1, 2024

Fix tracing spans for system #11169

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracing's overhead may be too high for profiling CPU-bound operations #4892

Tracing's overhead may be too high for profiling CPU-bound operations #4892

james7132 commented Jun 1, 2022

hymm commented Jun 1, 2022

hymm commented Jun 2, 2022 •

edited

Loading

james7132 commented Jun 2, 2022

hymm commented Jun 2, 2022

davidbarsky commented Jun 7, 2022

alice-i-cecile commented Sep 13, 2023

Tracing's overhead may be too high for profiling CPU-bound operations #4892

Tracing's overhead may be too high for profiling CPU-bound operations #4892

Comments

james7132 commented Jun 1, 2022

Bevy version

What you did

What went wrong

hymm commented Jun 1, 2022

hymm commented Jun 2, 2022 • edited Loading

james7132 commented Jun 2, 2022

hymm commented Jun 2, 2022

davidbarsky commented Jun 7, 2022

alice-i-cecile commented Sep 13, 2023

hymm commented Jun 2, 2022 •

edited

Loading