Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tracing spans to schedules, stages, systems #789

Merged
merged 3 commits into from
Nov 11, 2020

Conversation

superdump
Copy link
Contributor

@superdump superdump commented Nov 4, 2020

The tracing crate allows for gathering data on how much time is spent within a scope, as well as enabling structured logging. tracing has a structure similar to the log crate where instrumentation using spans or events (events being structured logs within a hierarchy of spans, so much more advanced and useful logs) are made using the tracing crate and then there are many other crates that can subscribe to the events to output data in a format useful for analysis and/or visualisation.

If you want to learn more I highly recommend watching this video: https://www.youtube.com/watch?v=JjItsfqFIdo

My motivation was focused on profiling, as I wanted to know how much time was being spent in each system in order to understand where game loop time was being spent. I have been a bit slow in making this PR as I wanted to find some useful ways of analysing and visualising the trace data.

A few good options:

  • tracing-flame - captures a trace to a file that can then be used with the inferno-flamegraph crate to generate flamegraphs (aggregate time spent within spans) or flamecharts (a timeline of spans for the duration of the capture). These are good but I found clicking around between views in an SVG to be a bit cumbersome.
  • tracing-chrome - captures a trace to a file that can be opened in Chrome/Chromium by visiting chrome://tracing and loading the file. This shows a timeline of events that are quick to navigate using the tools (pan / zoom / highlight range / etc). The dependency on Chrome doesn't feel great, but this looks potentially useful and standalone/possible to integrate: https://github.com/catapult-project/catapult/tree/master/tracing
  • tracing-tracy - streams the trace to the Tracy frame profiler. This is advanced real-time profiling designed for use with games. I had stability problems on macOS but perhaps on Linux / Windows that are better supported, this is the best option. It can display the timeline of events, statistical aggregates of how much time is being spent in spans, log messages, everything. Here's a screenshot:

Screenshot 2020-10-22 at 13 36 08

Unfortunately, it seems the default behaviour of all of these three crates is to not include any of the additional fields added to spans in the visualisation. tracing spans must have a static str name and only additional fields can be dynamic. This means that with this current implementation, you can only visualise that a schedule/stage/system has run, and not see what it is. I have made a modification to tracing-chrome here that includes the additional fields in the span's name for the visualisation to be able to see what span is what and similarly for tracing-tracy here.

To make use of tracing-chrome in my project, I did the following:

  • Cargo.toml
[features]
default = [ "trace" ]
trace = [ "tracing", "tracing-chrome", "tracing-subscriber", "bevy/trace" ]

[dependencies]
tracing = { version = "0.1.21", optional = true }
tracing-chrome = { git = "https://github.com/superdump/tracing-chrome", branch = "name.fields", optional = true }
tracing-subscriber = { version = "0.2.15", optional = true }
  • src/trace.rs
use tracing_chrome::ChromeLayerBuilder;
use tracing_subscriber::{fmt, prelude::*, registry::Registry, EnvFilter};

pub fn setup_global_subscriber() -> impl Drop {
    let fmt_layer = fmt::Layer::default();
    let filter_layer = EnvFilter::try_from_default_env()
        .or_else(|_| EnvFilter::try_new("error"))
        .unwrap();

    let (chrome_layer, _guard) = ChromeLayerBuilder::new().build();

    let subscriber = Registry::default()
        .with(filter_layer)
        .with(fmt_layer)
        .with(chrome_layer);

    tracing::subscriber::set_global_default(subscriber).expect("Could not set global default");
    _guard
}
  • src/main.rs
#[cfg(feature = "trace")]
use crate::trace::setup_global_subscriber;
#[cfg(feature = "trace")]
use tracing::{info, trace_span};

fn main() {
    #[cfg(feature = "trace")]
    let _guard = setup_global_subscriber();

    #[cfg(feature = "trace")]
    let minkraft_span = trace_span!("minkraft");
    #[cfg(feature = "trace")]
    let _minkraft_guard = minkraft_span.enter();

...

    info!("Running {}!", env!("CARGO_PKG_NAME").to_string());
    app_builder.run();

You can also enable bevy/wgpu_trace but be warned, it is very chatty to the point of impacting performance so I would suggest not using it, or using it sparingly or when you need to.

Note also that in the example code, I added an EnvFilter. This allows filtering by setting RUST_LOG=trace,wgpu=warn similar to how the env_logger crate works with log, but in addition it allows filtering of log events and spans based on some more advanced rules. See here for more information.

For sure some of this information should go into some documentation somewhere and I will add that to the PR once we've decided whether this is wanted and/or in what form.

@superdump
Copy link
Contributor Author

I just added an example. You can try it out with RUST_LOG=info cargo run --example tracing --features trace. Press Escape or close the window. In your terminal you should see output like:
Screenshot 2020-11-04 at 13 11 10

And if you open the file named similar to trace-1604491821.json in chrome://tracing you will see some graphs like:
Screenshot 2020-11-04 at 13 04 49

Zooming in near the beginning by holding down the left mouse button and using the mouse scroll wheel I can then see the startup schedule and the 'a_system' startup system from the example:
Screenshot 2020-11-04 at 13 05 23

Or looking further along, I can see the frame loop and the foo_bar_baz system:
Screenshot 2020-11-04 at 13 05 43

Note that the example is currently using my branch of tracing-chrome and so I guess it's not desirable to merge as-is but I thought it would be useful for the time being so people can get a feel for it. We can just revert the commit or if I can get the tracing-chrome changes upstreamed, I guess update to that.

@superdump
Copy link
Contributor Author

Opened a couple of PRs to see if the modifications to add the fields to the span name are upstreamable:

@karroffel karroffel added C-Feature A new feature, making something new possible C-Performance A change motivated by improving speed, memory usage or compile times labels Nov 4, 2020
@cart
Copy link
Member

cart commented Nov 11, 2020

Ok I really dig this 😄

I'll play around with this a bit before merging, but it seems like a reasonable thing to add (especially given that its optional).

My big open questions / followups (that don't need to be resolved in this pr):

  • Bevy needs a good logging setup thats enabled by default. My primary motivator for that is displaying warnings and non-panic-ing errors. Tracy tracing seems like a good fit for that, but I'm a little curious about how expensive it is for "normal non-timing related logs" relative to log + env_logger
  • I'm hoping the instrumentation added in this pr can replace the "profiler" code we have already. This would mean that we'd need to have something consume the system traces and produce the relevant Diagnostics
  • Ultimately I'm picturing the Bevy Editor having its own tracy tracing visualizer

@cart
Copy link
Member

cart commented Nov 11, 2020

Alrighty I adapted this to the latest changes on master and reverted back to the crates.io version of tracing-chrome because we can't publish git dependencies.

@cart
Copy link
Member

cart commented Nov 11, 2020

Also made the example's defaults exclude wgpu and include info so users don't need to set that manually

@cart cart merged commit a266578 into bevyengine:master Nov 11, 2020
@cart cart mentioned this pull request Nov 11, 2020
@superdump
Copy link
Contributor Author

I agree on all points. I think it would be good to start with a simple subscriber that probably could hook into the diagnostics to register measurements when spans are received. I do wish print diagnostics were ordered somehow though. Either descending by time spent or alphabetically by diagnostics name.

And yeah it would be super nice to have it built in to the editor later. A fun thing in the YouTube video was changing the filters at run time so you can refocus the events and spans to profile just what you want. Should help to enable exploratory profiling while minimising performance impact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-Feature A new feature, making something new possible C-Performance A change motivated by improving speed, memory usage or compile times
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants