Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use synchronized instead of reentrant lock in explicit bucket histogram #6309

Merged

Conversation

jack-berg
Copy link
Member

Was doing some analysis and noticed a small amount of allocations occurring when recording to explicit bucket histogram.

Tracked it down and determine that ReentrantLock actually performs some small amount of allocations for synchronization. See the screenshot below from my profiler. Replaced it with the synchronized keyword and the allocations went away. Also noticed a modest increase in performance.

Screenshot 2024-03-19 at 2 35 04 PM

HistogramBenchmark

Before

Benchmark                                                            (aggregation)                (valueGen)  Mode  Cnt      Score       Error   Units
HistogramBenchmark.aggregate_10Threads                     EXPLICIT_DEFAULT_BUCKET   FIXED_BUCKET_BOUNDARIES  avgt   10  36837.809 ± 14387.103   ns/op
HistogramBenchmark.aggregate_10Threads:gc.alloc.rate       EXPLICIT_DEFAULT_BUCKET   FIXED_BUCKET_BOUNDARIES  avgt   10      0.005 ±     0.001  MB/sec
HistogramBenchmark.aggregate_10Threads:gc.alloc.rate.norm  EXPLICIT_DEFAULT_BUCKET   FIXED_BUCKET_BOUNDARIES  avgt   10      0.020 ±     0.007    B/op
HistogramBenchmark.aggregate_10Threads:gc.count            EXPLICIT_DEFAULT_BUCKET   FIXED_BUCKET_BOUNDARIES  avgt   10        ≈ 0              counts
HistogramBenchmark.aggregate_10Threads                     EXPLICIT_DEFAULT_BUCKET  UNIFORM_RANDOM_WITHIN_2K  avgt   10  45622.392 ±  4692.903   ns/op
HistogramBenchmark.aggregate_10Threads:gc.alloc.rate       EXPLICIT_DEFAULT_BUCKET  UNIFORM_RANDOM_WITHIN_2K  avgt   10      0.005 ±     0.001  MB/sec
HistogramBenchmark.aggregate_10Threads:gc.alloc.rate.norm  EXPLICIT_DEFAULT_BUCKET  UNIFORM_RANDOM_WITHIN_2K  avgt   10      0.026 ±     0.002    B/op
HistogramBenchmark.aggregate_10Threads:gc.count            EXPLICIT_DEFAULT_BUCKET  UNIFORM_RANDOM_WITHIN_2K  avgt   10        ≈ 0              counts
HistogramBenchmark.aggregate_10Threads                     EXPLICIT_DEFAULT_BUCKET          GAUSSIAN_LATENCY  avgt   10  47397.833 ±  3943.018   ns/op
HistogramBenchmark.aggregate_10Threads:gc.alloc.rate       EXPLICIT_DEFAULT_BUCKET          GAUSSIAN_LATENCY  avgt   10      0.005 ±     0.001  MB/sec
HistogramBenchmark.aggregate_10Threads:gc.alloc.rate.norm  EXPLICIT_DEFAULT_BUCKET          GAUSSIAN_LATENCY  avgt   10      0.026 ±     0.002    B/op
HistogramBenchmark.aggregate_10Threads:gc.count            EXPLICIT_DEFAULT_BUCKET          GAUSSIAN_LATENCY  avgt   10        ≈ 0              counts
HistogramBenchmark.aggregate_1Threads                      EXPLICIT_DEFAULT_BUCKET   FIXED_BUCKET_BOUNDARIES  avgt   10  26015.096 ±   896.893   ns/op
HistogramBenchmark.aggregate_1Threads:gc.alloc.rate        EXPLICIT_DEFAULT_BUCKET   FIXED_BUCKET_BOUNDARIES  avgt   10      0.001 ±     0.001  MB/sec
HistogramBenchmark.aggregate_1Threads:gc.alloc.rate.norm   EXPLICIT_DEFAULT_BUCKET   FIXED_BUCKET_BOUNDARIES  avgt   10      0.014 ±     0.001    B/op
HistogramBenchmark.aggregate_1Threads:gc.count             EXPLICIT_DEFAULT_BUCKET   FIXED_BUCKET_BOUNDARIES  avgt   10        ≈ 0              counts
HistogramBenchmark.aggregate_1Threads                      EXPLICIT_DEFAULT_BUCKET  UNIFORM_RANDOM_WITHIN_2K  avgt   10  34910.477 ±  1182.348   ns/op
HistogramBenchmark.aggregate_1Threads:gc.alloc.rate        EXPLICIT_DEFAULT_BUCKET  UNIFORM_RANDOM_WITHIN_2K  avgt   10      0.001 ±     0.001  MB/sec
HistogramBenchmark.aggregate_1Threads:gc.alloc.rate.norm   EXPLICIT_DEFAULT_BUCKET  UNIFORM_RANDOM_WITHIN_2K  avgt   10      0.019 ±     0.001    B/op
HistogramBenchmark.aggregate_1Threads:gc.count             EXPLICIT_DEFAULT_BUCKET  UNIFORM_RANDOM_WITHIN_2K  avgt   10        ≈ 0              counts
HistogramBenchmark.aggregate_1Threads                      EXPLICIT_DEFAULT_BUCKET          GAUSSIAN_LATENCY  avgt   10  36557.168 ±   427.892   ns/op
HistogramBenchmark.aggregate_1Threads:gc.alloc.rate        EXPLICIT_DEFAULT_BUCKET          GAUSSIAN_LATENCY  avgt   10      0.001 ±     0.001  MB/sec
HistogramBenchmark.aggregate_1Threads:gc.alloc.rate.norm   EXPLICIT_DEFAULT_BUCKET          GAUSSIAN_LATENCY  avgt   10      0.020 ±     0.001    B/op
HistogramBenchmark.aggregate_1Threads:gc.count             EXPLICIT_DEFAULT_BUCKET          GAUSSIAN_LATENCY  avgt   10        ≈ 0              counts
HistogramBenchmark.aggregate_5Threads                      EXPLICIT_DEFAULT_BUCKET   FIXED_BUCKET_BOUNDARIES  avgt   10  27350.339 ±  2304.589   ns/op
HistogramBenchmark.aggregate_5Threads:gc.alloc.rate        EXPLICIT_DEFAULT_BUCKET   FIXED_BUCKET_BOUNDARIES  avgt   10      0.003 ±     0.001  MB/sec
HistogramBenchmark.aggregate_5Threads:gc.alloc.rate.norm   EXPLICIT_DEFAULT_BUCKET   FIXED_BUCKET_BOUNDARIES  avgt   10      0.016 ±     0.002    B/op
HistogramBenchmark.aggregate_5Threads:gc.count             EXPLICIT_DEFAULT_BUCKET   FIXED_BUCKET_BOUNDARIES  avgt   10        ≈ 0              counts
HistogramBenchmark.aggregate_5Threads                      EXPLICIT_DEFAULT_BUCKET  UNIFORM_RANDOM_WITHIN_2K  avgt   10  34618.320 ±   595.322   ns/op
HistogramBenchmark.aggregate_5Threads:gc.alloc.rate        EXPLICIT_DEFAULT_BUCKET  UNIFORM_RANDOM_WITHIN_2K  avgt   10      0.003 ±     0.001  MB/sec
HistogramBenchmark.aggregate_5Threads:gc.alloc.rate.norm   EXPLICIT_DEFAULT_BUCKET  UNIFORM_RANDOM_WITHIN_2K  avgt   10      0.020 ±     0.001    B/op
HistogramBenchmark.aggregate_5Threads:gc.count             EXPLICIT_DEFAULT_BUCKET  UNIFORM_RANDOM_WITHIN_2K  avgt   10        ≈ 0              counts
HistogramBenchmark.aggregate_5Threads                      EXPLICIT_DEFAULT_BUCKET          GAUSSIAN_LATENCY  avgt   10  37097.321 ±   534.516   ns/op
HistogramBenchmark.aggregate_5Threads:gc.alloc.rate        EXPLICIT_DEFAULT_BUCKET          GAUSSIAN_LATENCY  avgt   10      0.003 ±     0.001  MB/sec
HistogramBenchmark.aggregate_5Threads:gc.alloc.rate.norm   EXPLICIT_DEFAULT_BUCKET          GAUSSIAN_LATENCY  avgt   10      0.021 ±     0.001    B/op
HistogramBenchmark.aggregate_5Threads:gc.count             EXPLICIT_DEFAULT_BUCKET          GAUSSIAN_LATENCY  avgt   10        ≈ 0              counts

After

Benchmark                                                            (aggregation)                (valueGen)  Mode  Cnt      Score     Error   Units
HistogramBenchmark.aggregate_10Threads                     EXPLICIT_DEFAULT_BUCKET   FIXED_BUCKET_BOUNDARIES  avgt   10  30270.564 ± 927.032   ns/op
HistogramBenchmark.aggregate_10Threads:gc.alloc.rate       EXPLICIT_DEFAULT_BUCKET   FIXED_BUCKET_BOUNDARIES  avgt   10      0.005 ±   0.001  MB/sec
HistogramBenchmark.aggregate_10Threads:gc.alloc.rate.norm  EXPLICIT_DEFAULT_BUCKET   FIXED_BUCKET_BOUNDARIES  avgt   10      0.017 ±   0.001    B/op
HistogramBenchmark.aggregate_10Threads:gc.count            EXPLICIT_DEFAULT_BUCKET   FIXED_BUCKET_BOUNDARIES  avgt   10        ≈ 0            counts
HistogramBenchmark.aggregate_10Threads                     EXPLICIT_DEFAULT_BUCKET  UNIFORM_RANDOM_WITHIN_2K  avgt   10  38952.976 ± 970.539   ns/op
HistogramBenchmark.aggregate_10Threads:gc.alloc.rate       EXPLICIT_DEFAULT_BUCKET  UNIFORM_RANDOM_WITHIN_2K  avgt   10      0.005 ±   0.001  MB/sec
HistogramBenchmark.aggregate_10Threads:gc.alloc.rate.norm  EXPLICIT_DEFAULT_BUCKET  UNIFORM_RANDOM_WITHIN_2K  avgt   10      0.022 ±   0.001    B/op
HistogramBenchmark.aggregate_10Threads:gc.count            EXPLICIT_DEFAULT_BUCKET  UNIFORM_RANDOM_WITHIN_2K  avgt   10        ≈ 0            counts
HistogramBenchmark.aggregate_10Threads                     EXPLICIT_DEFAULT_BUCKET          GAUSSIAN_LATENCY  avgt   10  38415.200 ± 983.709   ns/op
HistogramBenchmark.aggregate_10Threads:gc.alloc.rate       EXPLICIT_DEFAULT_BUCKET          GAUSSIAN_LATENCY  avgt   10      0.005 ±   0.001  MB/sec
HistogramBenchmark.aggregate_10Threads:gc.alloc.rate.norm  EXPLICIT_DEFAULT_BUCKET          GAUSSIAN_LATENCY  avgt   10      0.022 ±   0.001    B/op
HistogramBenchmark.aggregate_10Threads:gc.count            EXPLICIT_DEFAULT_BUCKET          GAUSSIAN_LATENCY  avgt   10        ≈ 0            counts
HistogramBenchmark.aggregate_1Threads                      EXPLICIT_DEFAULT_BUCKET   FIXED_BUCKET_BOUNDARIES  avgt   10  25116.150 ± 299.397   ns/op
HistogramBenchmark.aggregate_1Threads:gc.alloc.rate        EXPLICIT_DEFAULT_BUCKET   FIXED_BUCKET_BOUNDARIES  avgt   10      0.001 ±   0.001  MB/sec
HistogramBenchmark.aggregate_1Threads:gc.alloc.rate.norm   EXPLICIT_DEFAULT_BUCKET   FIXED_BUCKET_BOUNDARIES  avgt   10      0.014 ±   0.001    B/op
HistogramBenchmark.aggregate_1Threads:gc.count             EXPLICIT_DEFAULT_BUCKET   FIXED_BUCKET_BOUNDARIES  avgt   10        ≈ 0            counts
HistogramBenchmark.aggregate_1Threads                      EXPLICIT_DEFAULT_BUCKET  UNIFORM_RANDOM_WITHIN_2K  avgt   10  32274.696 ± 744.840   ns/op
HistogramBenchmark.aggregate_1Threads:gc.alloc.rate        EXPLICIT_DEFAULT_BUCKET  UNIFORM_RANDOM_WITHIN_2K  avgt   10      0.001 ±   0.001  MB/sec
HistogramBenchmark.aggregate_1Threads:gc.alloc.rate.norm   EXPLICIT_DEFAULT_BUCKET  UNIFORM_RANDOM_WITHIN_2K  avgt   10      0.017 ±   0.001    B/op
HistogramBenchmark.aggregate_1Threads:gc.count             EXPLICIT_DEFAULT_BUCKET  UNIFORM_RANDOM_WITHIN_2K  avgt   10        ≈ 0            counts
HistogramBenchmark.aggregate_1Threads                      EXPLICIT_DEFAULT_BUCKET          GAUSSIAN_LATENCY  avgt   10  31970.603 ± 807.993   ns/op
HistogramBenchmark.aggregate_1Threads:gc.alloc.rate        EXPLICIT_DEFAULT_BUCKET          GAUSSIAN_LATENCY  avgt   10      0.001 ±   0.001  MB/sec
HistogramBenchmark.aggregate_1Threads:gc.alloc.rate.norm   EXPLICIT_DEFAULT_BUCKET          GAUSSIAN_LATENCY  avgt   10      0.017 ±   0.001    B/op
HistogramBenchmark.aggregate_1Threads:gc.count             EXPLICIT_DEFAULT_BUCKET          GAUSSIAN_LATENCY  avgt   10        ≈ 0            counts
HistogramBenchmark.aggregate_5Threads                      EXPLICIT_DEFAULT_BUCKET   FIXED_BUCKET_BOUNDARIES  avgt   10  23826.553 ± 350.842   ns/op
HistogramBenchmark.aggregate_5Threads:gc.alloc.rate        EXPLICIT_DEFAULT_BUCKET   FIXED_BUCKET_BOUNDARIES  avgt   10      0.003 ±   0.001  MB/sec
HistogramBenchmark.aggregate_5Threads:gc.alloc.rate.norm   EXPLICIT_DEFAULT_BUCKET   FIXED_BUCKET_BOUNDARIES  avgt   10      0.014 ±   0.001    B/op
HistogramBenchmark.aggregate_5Threads:gc.count             EXPLICIT_DEFAULT_BUCKET   FIXED_BUCKET_BOUNDARIES  avgt   10        ≈ 0            counts
HistogramBenchmark.aggregate_5Threads                      EXPLICIT_DEFAULT_BUCKET  UNIFORM_RANDOM_WITHIN_2K  avgt   10  29492.544 ± 556.613   ns/op
HistogramBenchmark.aggregate_5Threads:gc.alloc.rate        EXPLICIT_DEFAULT_BUCKET  UNIFORM_RANDOM_WITHIN_2K  avgt   10      0.003 ±   0.001  MB/sec
HistogramBenchmark.aggregate_5Threads:gc.alloc.rate.norm   EXPLICIT_DEFAULT_BUCKET  UNIFORM_RANDOM_WITHIN_2K  avgt   10      0.017 ±   0.001    B/op
HistogramBenchmark.aggregate_5Threads:gc.count             EXPLICIT_DEFAULT_BUCKET  UNIFORM_RANDOM_WITHIN_2K  avgt   10        ≈ 0            counts
HistogramBenchmark.aggregate_5Threads                      EXPLICIT_DEFAULT_BUCKET          GAUSSIAN_LATENCY  avgt   10  31819.090 ± 562.777   ns/op
HistogramBenchmark.aggregate_5Threads:gc.alloc.rate        EXPLICIT_DEFAULT_BUCKET          GAUSSIAN_LATENCY  avgt   10      0.003 ±   0.001  MB/sec
HistogramBenchmark.aggregate_5Threads:gc.alloc.rate.norm   EXPLICIT_DEFAULT_BUCKET          GAUSSIAN_LATENCY  avgt   10      0.018 ±   0.001    B/op
HistogramBenchmark.aggregate_5Threads:gc.count             EXPLICIT_DEFAULT_BUCKET          GAUSSIAN_LATENCY  avgt   10        ≈ 0            counts

Notice the small improvement in gc.alloc.rate.norm and the larger-than-expected improvement in ns/op.

@jack-berg jack-berg requested a review from a team March 19, 2024 19:38
Copy link

codecov bot commented Mar 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.02%. Comparing base (6e3d087) to head (aab7e89).
Report is 3 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #6309      +/-   ##
============================================
- Coverage     91.04%   91.02%   -0.02%     
+ Complexity     5726     5725       -1     
============================================
  Files           625      625              
  Lines         16744    16743       -1     
  Branches       1713     1713              
============================================
- Hits          15244    15240       -4     
- Misses         1006     1008       +2     
- Partials        494      495       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@breedx-splk breedx-splk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, even though Handle is an inner class, instances of it are available through the createHandle() public method of the containing class. As such, a misbehaving user of this class could also synchronized(handle) and potentially form a deadlock.

I think it should be both safer and accomplish the same goals if you were to create a private final Object lock and synchronize on that instead of marking the instance methods synchronized. Make sense?

@jack-berg
Copy link
Member Author

Meh.. I understand that calling code can also synchronize on the intrinsic lock, but: 1. This is internal code. 2. Doing so should always be done with caution.

We have examples of using both the intrinsic lock and a dedicated explicit lock in internal code. Preferring one or the other is splitting hairs IMO. Let's see if any other @open-telemetry/java-approvers have a preference. If so, should probably add it to best practices.

@jkwatson
Copy link
Contributor

I'm with Jason on this one. It seems like a simple enough change to have an internal Object to lock on, and it will prevent having to figure out what's going on if someone decides to foot-gun themselves and synchronize on our thing.

@jack-berg
Copy link
Member Author

Updated to use a dedicated lock, added this advice to the best practices guide

Copy link
Contributor

@breedx-splk breedx-splk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for being flexible!

@jack-berg jack-berg merged commit 1dde255 into open-telemetry:main Apr 3, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants