Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracing's overhead may be too high for profiling CPU-bound operations #4892

Closed
james7132 opened this issue Jun 1, 2022 · 6 comments
Closed
Labels
A-Diagnostics Logging, crash handling, error reporting and performance analysis C-Bug An unexpected or incorrect behavior

Comments

@james7132
Copy link
Member

Bevy version

0.7 and current head (cc4062e)

What you did

Attempted to profile many_foxes with trace_tracy enabled. Commented out spans for system_overhead and system_commands.

What went wrong

Profiled measurements for each stage show significant changes. Here's a comparison of the Update stage measurements:

image

There's only 10-20 systems in that stage, and the overhead scales with the number of systems. This overhead should be much lower, and it makes it difficult to make perf decisions on a macro level.

@james7132 james7132 added C-Bug An unexpected or incorrect behavior A-Diagnostics Logging, crash handling, error reporting and performance analysis labels Jun 1, 2022
@hymm
Copy link
Contributor

hymm commented Jun 1, 2022

I've been wanting to put certain sets of spans under different features for this reason. At the very least separate it into the system/overhead spans and everything else. We'd probably want to have them enabled under the trace feature, but be able to separate them out too.

@hymm
Copy link
Contributor

hymm commented Jun 2, 2022

I should mention that a big portion of the difference between the two histograms above is due to the applying system commands. This is mostly just a constant offset since most systems don't apply commands every frame. The issue is that it shows up in the trace as a bunch of spans.

image

This check is very light quick in actual code since it's a tight loop just checking that the buffers are empty. Without tracing enable applying these commands is probably less than a microsecond, but shows up in the above image as 90us.

I'm not sure how much of an issue this is as it shows up a constant offset for most frames, but we should probably supply a way of disabling these spans if people want to see something closer to the real framerate.

@james7132
Copy link
Member Author

For apply_buffers we could move it into the actual SystemParam apply_buffer implementations where it has non-zero work. Only downside is that zero/empty frames are not included in the statistics.

However, the other side in prepare systems is both a non-trivial real cost (scheduling onto the task pool) and a tight span cost.

One potential cause of this might be that the tracing macros allocate memory, or the subscriber/visitor model of tracing may not be a trivial operation, which have a non-trivial runtime cost. See nagisa/rust_tracy_client#26, which includes a tracy_client level non-allocating span! macro, only downside is that it only supports static strings.

@hymm
Copy link
Contributor

hymm commented Jun 2, 2022

For apply_buffers we could move it into the actual SystemParam apply_buffer implementations where it has non-zero work.

We'd have to get the name of the originating system down into the SystemParam somehow. Might be doable. People have wanted the name there for other reasons like error reporting. See #2949

non-allocating span! macro

Would we have to bypass the tracing crate to use this?

@davidbarsky
Copy link

Notes from Discord: the span macro does not allocate, which means that the overhead isn't tracing or tracing-core, but rather, tracing-subscriber. tracing-subscriber's Registry and Extensions (which are used by tracing-tracy) to store profiling have room to improve performance, but a decent amount of the overhead you're experiencing comes from tracing-subscriber::Registry trying to be modular and flexible in accepting data for arbitrary user-defined Layers.

For a profiling use-case, it might be worthwhile to implement your own subscriber (replacing the Registry but using the underlying sharded-slab data structure without boxing values into extensions.)

@hawkw has some ideas to improve tracing_subscriber::Registry, where instead of using a HashMap for extensions, Layer types would be required to pre-declare extension types they'd be storing in the Registry and use arrays to store extensions. However, this will be a tracing-subscriber 0.4 change, which is not currently scheduled.

bors bot pushed a commit that referenced this issue Dec 11, 2022
# Objective
A separate `tracing` span for running a system's commands is created, even if the system doesn't have commands. This is adding extra measuring overhead (see #4892) where it's not needed.

## Solution
Move the span into `ParallelCommandState` and `CommandQueue`'s `SystemParamState::apply`. To get the right metadata for the span, a additional `&SystemMeta` parameter was added to `SystemParamState::apply`.

---

## Changelog
Added: `SystemMeta::name`
Changed: Systems without `Commands` and  `ParallelCommands` will no longer show a "system_commands" span when profiling.
Changed: `SystemParamState::apply` now takes a `&SystemMeta` parameter in addition to the provided `&mut World`.
alradish pushed a commit to alradish/bevy that referenced this issue Jan 22, 2023
# Objective
A separate `tracing` span for running a system's commands is created, even if the system doesn't have commands. This is adding extra measuring overhead (see bevyengine#4892) where it's not needed.

## Solution
Move the span into `ParallelCommandState` and `CommandQueue`'s `SystemParamState::apply`. To get the right metadata for the span, a additional `&SystemMeta` parameter was added to `SystemParamState::apply`.

---

## Changelog
Added: `SystemMeta::name`
Changed: Systems without `Commands` and  `ParallelCommands` will no longer show a "system_commands" span when profiling.
Changed: `SystemParamState::apply` now takes a `&SystemMeta` parameter in addition to the provided `&mut World`.
ItsDoot pushed a commit to ItsDoot/bevy that referenced this issue Feb 1, 2023
# Objective
A separate `tracing` span for running a system's commands is created, even if the system doesn't have commands. This is adding extra measuring overhead (see bevyengine#4892) where it's not needed.

## Solution
Move the span into `ParallelCommandState` and `CommandQueue`'s `SystemParamState::apply`. To get the right metadata for the span, a additional `&SystemMeta` parameter was added to `SystemParamState::apply`.

---

## Changelog
Added: `SystemMeta::name`
Changed: Systems without `Commands` and  `ParallelCommands` will no longer show a "system_commands" span when profiling.
Changed: `SystemParamState::apply` now takes a `&SystemMeta` parameter in addition to the provided `&mut World`.
@alice-i-cecile
Copy link
Member

I'm calling this fixed by #9390. If there's still ongoing problems, please reopen this issue :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Diagnostics Logging, crash handling, error reporting and performance analysis C-Bug An unexpected or incorrect behavior
Projects
None yet
Development

No branches or pull requests

4 participants