Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subscriber: make Registry::enter/exit much faster #1058

Merged
merged 6 commits into from
Oct 21, 2020

Conversation

hawkw
Copy link
Member

@hawkw hawkw commented Oct 21, 2020

Motivation

We've tried very hard to make sure that entering and exiting spans is
lightweight...in the tracing-core core machinery. Unfortunately, we
haven't done any benchmarking of how subscriber implementations actually
handle enter/exit events. It turns out that in tracing-subscriber's
Registry, there's actually significant overhead for entering a span:
calling span.enter() may take as long as 69 ns (on my machine).

Solution

I've written some microbenchmarks for entering and exiting enabled spans
using tracing-subscriber::fmt, comparing them with the overhead of
calling enter on an enabled span. Based on this, I've made some
performance improvements. These optimizations include:

  • Removing the HashSet that's used for tracking previously entered
    span IDs, in favor of linear search. Span stacks probably never deep
    enough for a hashmap to be faster than iterating over a couple of
    vec indices.
  • Preallocating the vec used for the span stack to hold at least 64
    elements. This means we'll never have a lag spike from reallocating,
    as I think it'll almost never be deeper than 64 IDs.
  • Only cloning/dropping an ID's ref count for the first ID in the stack.

This makes entering and exiting enabled spans significantly faster:
image

It would be nice to continue optimizing this, but this might be about
the best it gets given the other requirements that we're now making
assertions about.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
@hawkw hawkw added crate/subscriber Related to the `tracing-subscriber` crate kind/perf Performance improvements labels Oct 21, 2020
@hawkw hawkw requested a review from a team as a code owner October 21, 2020 22:59
@hawkw
Copy link
Member Author

hawkw commented Oct 21, 2020

I'll backport this to v0.1.x when it lands.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Copy link
Member

@davidbarsky davidbarsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love how tiny this change is and how large the impact is.

@hawkw hawkw merged commit 67340e9 into master Oct 21, 2020
hawkw added a commit that referenced this pull request Oct 21, 2020
## Motivation

We've tried very hard to make sure that entering and exiting spans is
lightweight...in the `tracing-core` core machinery. Unfortunately, we
haven't done any benchmarking of how subscriber implementations actually
handle enter/exit events. It turns out that in `tracing-subscriber`'s
`Registry`, there's actually significant overhead for entering a span:
calling `span.enter()` may take as long as 69 ns (on my machine).

## Solution

I've written some microbenchmarks for entering and exiting enabled spans
using `tracing-subscriber::fmt`, comparing them with the overhead of
calling `enter` on an enabled span. Based on this, I've made some
performance improvements. These optimizations include:

- Removing the `HashSet` that's used for tracking previously entered
  span IDs, in favor of linear search. Span stacks probably never deep
  enough for a hashmap to be faster than iterating over a couple of
  vec indices.
- Preallocating the vec used for the span stack to hold at least 64
  elements. This means we'll never have a lag spike from reallocating,
  as I think it'll almost never be deeper than 64 IDs.
- Only cloning/dropping an ID's ref count for the _first_ ID in the stack.

This makes entering and exiting enabled spans significantly faster:
![image](https://user-images.githubusercontent.com/2796466/96798681-3fc85000-13b6-11eb-9e85-7602d918ee09.png)

It would be nice to continue optimizing this, but this might be about
the best it gets given the other requirements that we're now making
assertions about.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
dvdplm added a commit to dvdplm/tracing that referenced this pull request Oct 22, 2020
…spatch-as-owned-tokio-rs#455

* upstream/master: (34 commits)
  subscriber: remove TraceLogger (tokio-rs#1052)
  subscriber: make Registry::enter/exit much faster (tokio-rs#1058)
  chore(deps): update env_logger requirement from 0.7 to 0.8 (tokio-rs#1050)
  chore: fix tracing-macros::dbg (tokio-rs#1054)
  chore(deps): update pin-project requirement from 0.4 to 1.0 (tokio-rs#1038)
  chore: remove duplicated section from tracing/README.md (tokio-rs#1046)
  opentelemetry: prepare for 0.8.0 release (tokio-rs#1036)
  docs: add favicon for extra pretty docs (tokio-rs#1033)
  subscriber: fix `reload` ergonomics (tokio-rs#1035)
  chore(deps): update crossbeam-channel requirement from 0.4.2 to 0.5.0 (tokio-rs#1031)
  opentelemetry: Assign default ids if missing (tokio-rs#1027)
  chore: remove deprecated add-path from CI (tokio-rs#1026)
  attributes:  fix `#[instrument(err)]` in case of early returns (tokio-rs#1006)
  core: remove mandatory liballoc dependency with no-std (tokio-rs#1017)
  chore(deps): update cfg-if requirement from 0.1.10 to 1.0.0 (tokio-rs#1023)
  subscriber: warn if trying to enable a statically disabled level (tokio-rs#990)
  subscriber: use macros for module declarations (tokio-rs#1009)
  chore: remove `stdlib.rs` (tokio-rs#1008)
  core: fix linked list tests reusing `Registration`s (tokio-rs#1016)
  subscriber: support dash in target names (tokio-rs#1012)
  ...
dvdplm added a commit to dvdplm/tracing that referenced this pull request Oct 22, 2020
…spatch-as-ref-tokio-rs#455

* upstream/master:
  subscriber: remove TraceLogger (tokio-rs#1052)
  subscriber: make Registry::enter/exit much faster (tokio-rs#1058)
  chore(deps): update env_logger requirement from 0.7 to 0.8 (tokio-rs#1050)
  chore: fix tracing-macros::dbg (tokio-rs#1054)
  chore(deps): update pin-project requirement from 0.4 to 1.0 (tokio-rs#1038)
  chore: remove duplicated section from tracing/README.md (tokio-rs#1046)
dvdplm added a commit to dvdplm/tracing that referenced this pull request Oct 22, 2020
* upstream/master:
  subscriber: remove TraceLogger (tokio-rs#1052)
  subscriber: make Registry::enter/exit much faster (tokio-rs#1058)
  chore(deps): update env_logger requirement from 0.7 to 0.8 (tokio-rs#1050)
dvdplm added a commit to dvdplm/tracing that referenced this pull request Oct 22, 2020
* dp-target-is-cow:
  subscriber: remove TraceLogger (tokio-rs#1052)
  subscriber: make Registry::enter/exit much faster (tokio-rs#1058)
  Use impl Into<Cow<'a, str>
  chore(deps): update env_logger requirement from 0.7 to 0.8 (tokio-rs#1050)
  No need for extra lifetime
  Add constructor for dynamic data
  chore: fix tracing-macros::dbg (tokio-rs#1054)
  Feature gate usage of Cow in Metadata
  chore(deps): update pin-project requirement from 0.4 to 1.0 (tokio-rs#1038)
  chore: remove duplicated section from tracing/README.md (tokio-rs#1046)
hawkw added a commit that referenced this pull request Oct 22, 2020
This backports #1058 to v0.1.x. This was already approved on master.

## Motivation

We've tried very hard to make sure that entering and exiting spans is
lightweight...in the `tracing-core` core machinery. Unfortunately, we
haven't done any benchmarking of how subscriber implementations actually
handle enter/exit events. It turns out that in `tracing-subscriber`'s
`Registry`, there's actually significant overhead for entering a span:
calling `span.enter()` may take as long as 69 ns (on my machine).

## Solution

I've written some microbenchmarks for entering and exiting enabled spans
using `tracing-subscriber::fmt`, comparing them with the overhead of
calling `enter` on an enabled span. Based on this, I've made some
performance improvements. These optimizations include:

- Removing the `HashSet` that's used for tracking previously entered
  span IDs, in favor of linear search. Span stacks probably never deep
  enough for a hashmap to be faster than iterating over a couple of
  vec indices.
- Preallocating the vec used for the span stack to hold at least 64
  elements. This means we'll never have a lag spike from reallocating,
  as I think it'll almost never be deeper than 64 IDs.
- Only cloning/dropping an ID's ref count for the _first_ ID in the stack.

This makes entering and exiting enabled spans significantly faster:
![image](https://user-images.githubusercontent.com/2796466/96798681-3fc85000-13b6-11eb-9e85-7602d918ee09.png)

It would be nice to continue optimizing this, but this might be about
the best it gets given the other requirements that we're now making
assertions about.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
hawkw added a commit that referenced this pull request Oct 22, 2020
Fixed

- **registry**: Fixed `Registry::new` allocating an excessively large
  amount of memory, most of which would never be used ([#1064])

Changed

- **registry**: Improved `new_span` performance by reusing `HashMap`
  allocations for `Extensions` ([#1064])
- **registry**: Significantly improved the performance of
  `Registry::enter` and `Registry::exit` ([#1058])

[#1064]: #1064
[#1058]: #1058
hawkw added a commit that referenced this pull request Oct 22, 2020
Fixed

- **registry**: Fixed `Registry::new` allocating an excessively large
  amount of memory, most of which would never be used ([#1064])

Changed

- **registry**: Improved `new_span` performance by reusing `HashMap`
  allocations for `Extensions` ([#1064])
- **registry**: Significantly improved the performance of
  `Registry::enter` and `Registry::exit` ([#1058])

[#1064]: #1064
[#1058]: #1058
hawkw added a commit to linkerd/linkerd2-proxy that referenced this pull request Oct 22, 2020
This picks up the following upstream changes:

* tokio-rs/tracing#1064, which fixes a bug where creating a subscriber
  would allocate a large amount of memory that was not actually used

* tokio-rs/tracing#1058, which significantly reduces the overhead of
  entering and exiting spans

Together, these changes should improve proxy performance when tracing is
enabled.
olix0r pushed a commit to linkerd/linkerd2-proxy that referenced this pull request Oct 23, 2020
This picks up the following upstream changes:

* tokio-rs/tracing#1064, which fixes a bug where creating a subscriber
  would allocate a large amount of memory that was not actually used

* tokio-rs/tracing#1058, which significantly reduces the overhead of
  entering and exiting spans

Together, these changes should improve proxy performance when tracing is
enabled.
kaffarell pushed a commit to kaffarell/tracing that referenced this pull request May 22, 2024
Fixed

- **registry**: Fixed `Registry::new` allocating an excessively large
  amount of memory, most of which would never be used ([tokio-rs#1064])

Changed

- **registry**: Improved `new_span` performance by reusing `HashMap`
  allocations for `Extensions` ([tokio-rs#1064])
- **registry**: Significantly improved the performance of
  `Registry::enter` and `Registry::exit` ([tokio-rs#1058])

[tokio-rs#1064]: tokio-rs#1064
[tokio-rs#1058]: tokio-rs#1058
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
crate/subscriber Related to the `tracing-subscriber` crate kind/perf Performance improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants