Concurrency improvements to RemoteControlledSampler #609

yborovikov · 2019-03-29T18:19:48Z

This removes synchronization from RemoteControlledSampler and makes RateLimiter thread-safe.

Which problem is this PR solving?

As described in #608 Synchronization in RemoteControlledSampler was causing severe performance degradation in multi-threaded applications.

RateLimiter.checkBalance() implementation was not thread-safe and didn't correctly initialize lastTick, causing non-deterministic behavior for the first invocation of .checkBalance().

Short description of the changes

Synchronization removed from RemoteControlledSampler.

Internal state of RateLimiter is now stored in AtomicLongs, with .checkBalance() modified to use an approach similar to AtomicLong.getAndAccumulate() - to prevent internal state corruption in concurrent flows.

Fixes #608

This removes synchronization from RemoteControlledSampler and makes RateLimiter thread-safe. Signed-off-by: Yegor Borovikov <yegor@uber.com>

Signed-off-by: Yegor Borovikov <yegor@uber.com>

codecov · 2019-03-29T18:27:21Z

Codecov Report

Merging #609 into master will increase coverage by 0.38%.
The diff coverage is 90.32%.

@@             Coverage Diff              @@
##             master     #609      +/-   ##
============================================
+ Coverage     89.27%   89.65%   +0.38%     
- Complexity      544      545       +1     
============================================
  Files            68       68              
  Lines          1958     1953       -5     
  Branches        251      253       +2     
============================================
+ Hits           1748     1751       +3     
+ Misses          133      125       -8     
  Partials         77       77

Impacted Files	Coverage Δ	Complexity Δ
...ing/internal/samplers/RemoteControlledSampler.java	`91.25% <88.88%> (+1.71%)`	`16 <5> (-1)`	⬇️
...a/io/jaegertracing/internal/utils/RateLimiter.java	`94.73% <92.3%> (-5.27%)`	`5 <3> (ø)`
...egertracing/internal/reporters/RemoteReporter.java	`84.88% <0%> (+2.32%)`	`7% <0%> (ø)`	⬇️
...gertracing/internal/reporters/LoggingReporter.java	`90.9% <0%> (+9.09%)`	`5% <0%> (+1%)`	⬆️
...rtracing/internal/reporters/CompositeReporter.java	`100% <0%> (+28.57%)`	`7% <0%> (+1%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 20ee4e8...709062f. Read the comment docs.

yurishkuro · 2019-03-30T19:43:01Z

This is a tricky change, I haven't spent enough time reviewing it yet. High level impressions though:

The rate limiter is being made thread-safe by this PR, we should reflect it in the class javadoc.
The way it is being made thread-safe (via atomic longs) is tricky, I had a hard time following whether it introduces more race conditions. It would be useful to see a benchmark of whether atomic longs are actually saving time vs. just synchronizing the rate limiter update methods.
The change to the remote sampler makes a major assumption that the update methods are thread-safe on every possible sampler implementation. It's not obvious at all if that's the case. Also, even if update() methods are thread-safe, it seems like the equals() method now becomes racy.

I would like to see documentation added to the expectation of thread-safety on different samplers (this was not a requirement previously). Alternatively, we should consider if pushing thread-safety to individual samplers is the right approach. The alternative could be a "copy on write" implementation of the top-level adaptiveSampler.update() method where each individual sampler is asked to update() itself to new parameters, and it internally decides whether to return this (in case of no changes) or a new instance. However, this approach still has race conditions for rate limiting samplers which might be concurrently asked isSampled() and update their internal state.

yborovikov · 2019-03-31T21:04:41Z

Thank you for reviewing the change!

The rate limiter is being made thread-safe by this PR, we should reflect it in the class javadoc.

Are there any classes in Jaeger that are not expected to be thread-safe? If so, I'd suggest marking those as such, not the other way around. Using @javax.annotation.concurrent.[Not]ThreadSafe might be a good way to express these contracts.

The way it is being made thread-safe (via atomic longs) is tricky, I had a hard time following whether it introduces more race conditions. It would be useful to see a benchmark of whether atomic longs are actually saving time vs. just synchronizing the rate limiter update methods.

Most of the trickery comes from the need to convert double arithmetic to integer-based. I'd suggest changing the API to use integer quotas. Given it's an internal implementation class and the only use case is .checkCredit(1.0) that should be easy. But not strictly necessary.

Another bit than might feel tricky is a standard optimistic locking over CAS pattern, very similar to Java 8+'s AtomicLong.accumulateAndGet() implementation (as mentioned in the PR).

The exising (and added concurrent) unit tests confirm the correcness of the behavior is preserved.

Concurrent performance wise, rule of thumb would be for volatile to be typically faster than synchronized - fairly understandable when one takes the respective granularity of happens-before guarantees. There's plenty of articles and benchmarks on the topic, here's one of the first ones - from 2004 by Brian Goetz.

Of course, any specific use case can be unique (and the real comparisons should be derived from observing the real-life, production behavior). Anyway, here's some JMH results I got from benchmarking synchronized and atomic implementations of RateLimiter (8 threads, ~10% time spent in .checkCredit()):

Benchmark                             Mode  Cnt         Score        Error  Units
SynchronizedRateLimiter.baseline     thrpt  100  17299664.806 ± 250785.751  ops/s
SynchronizedRateLimiter.checkCredit  thrpt  100   4743227.226 ± 141156.036  ops/s
AtomicRateLimiter.checkCredit        thrpt  100  13088194.369 ± 154981.298  ops/s

.baseline is original (no locking, not thread-safe) implementation - for comparison.
This is, obviously, still a synthetic benchmark - and, personally, I expect way more of a difference in a synchronized .checkCredit() performance in an efficient concurrent application.

The change to the remote sampler makes a major assumption that the update methods are thread-safe on every possible sampler implementation. It's not obvious at all if that's the case. Also, even if update() methods are thread-safe, it seems like the equals() method now becomes racy.

I checked every implementation in the project (that's how I came to fixing RateLimiter in this PR). There are some improvements that can be made (e.g. switching PerOperationSampler from syncronized to using ConcurrentHashMap) but overall it looks clean. Adding concurrent behavior tests sounds like a reasonable next step.

Same applies to .equals() - granted, I do not fully understand the logic/purpose behind some of its implementations. Anyway, .equals() on a mutable object is not guaranteed to return the same result for the same argument over time.

I would like to see documentation added to the expectation of thread-safety on different samplers (this was not a requirement previously).

I believe it was - given that many of the samplers (e.g. RateLimitingSampler) are instantiated by Configuration and are directly wired into Tracer - there seems to be no expectation that every sampler will be wrapped by a RemoteControlledSampler (to guarantee thread-safety?).

Alternatively, we should consider if pushing thread-safety to individual samplers is the right approach.

Yes, absolutely.

The alternative could be a "copy on write" implementation of the top-level adaptiveSampler.update() method where each individual sampler is asked to update() itself to new parameters, and it internally decides whether to return this (in case of no changes) or a new instance. However, this approach still has race conditions for rate limiting samplers which might be concurrently asked isSampled() and update their internal state.

Thanks again for your review, comments, and questions!

jpkrohling · 2019-04-01T14:23:15Z

LGTM, but I wonder if this is sufficient to fix the behavior you were seeing as part of #608. Could you share a memory profile similar to what you have in the original issue? This would help evaluate the gains from this PR.

Additionally, would you be able to publish the JMH tests in a public repository? This would be very useful for us in the future.

yborovikov · 2019-04-02T00:54:00Z

I updated RateLimiter with a simpler implementation that uses a single AtomicLong to track spent (and wasted - due to exceeding max balance) credits (in nano ticks). Hopefully, it's easier to comprehend, too.

This approach yields performance on par with the baseline (or, possibly, better - due to less floating point operations and main memory writes):

Benchmark                             Mode  Cnt         Score        Error  Units
SynchronizedRateLimiter.baseline     thrpt  100  16462034.465 ± 402504.035  ops/s
SynchronizedRateLimiter.checkCredit  thrpt  100   4744500.578 ±  88917.828  ops/s
AtomicRateLimiter.checkCredit        thrpt  100  13292796.354 ± 313919.827  ops/s
AtomicRateLimiter1.checkCredit       thrpt  100  17130924.009 ± 238918.453  ops/s

yborovikov · 2019-04-02T01:10:22Z

LGTM, but I wonder if this is sufficient to fix the behavior you were seeing as part of #608. Could you share a memory profile similar to what you have in the original issue? This would help evaluate the gains from this PR.

I updated our internal code to work around #608 by moving all (or most of) the sampling to a single thread. This resulted in much healthier threads:

Arguably, getting rid of the synchronized bottleneck should produce a similar effect (except that PerOperationSampler is still heavily synchronized...) - but I haven't tried it in a real (production) environment.

Additionally, would you be able to publish the JMH tests in a public repository? This would be very useful for us in the future.

Let me see if I can remove internal build scaffolding from my JMH project - this should be fairly easy.

Signed-off-by: Yegor Borovikov <yegor@uber.com>

yborovikov · 2019-04-02T21:39:18Z

I ran the RateLimiter benchmark (with 100% time spent in .checkCredit()) on a 32(logical)-core Linux server for different numbers of threads.
Here's the resulting graph (vertical axis: ops/s, log scale; horizontal axis: number of threads, log scale):

baseline (green) is the original (not thread-safe) implementation.
_synchronized (blue) is the original with synchronization added.
atomic (orange) is using a single AtomicLong to store the state (this PR).

I'm fairly satisfied with the state of this PR and don't have any further changes planned.

Please take some time to review the code and let me know if something needs to be fixed.

jpkrohling

Looks good to me. I was a bit uneasy at first with the RemoteControllerSampler, but given that the TimerTask is the only place changing the sampler, volatile does look appropriate.

jaeger-core/src/main/java/io/jaegertracing/internal/samplers/RemoteControlledSampler.java

jaeger-core/src/test/java/io/jaegertracing/internal/utils/RateLimiterTest.java

Signed-off-by: Yegor Borovikov <yegor@uber.com>

jpkrohling · 2019-04-04T08:57:34Z

@yurishkuro are there any remaining concerns about this one?

pavolloffay · 2019-04-29T11:01:38Z

I have talked to @jpkrohling that we should merge this PR. It's been a long time since it was approved.

william-tran · 2020-07-27T15:28:57Z

@yborovikov thanks for this.

There are some improvements that can be made (e.g. switching PerOperationSampler from syncronized to using ConcurrentHashMap) but overall it looks clean. Adding concurrent behavior tests sounds like a reasonable next step.

I was looking at the PerOperationSampler code and the public synchronized SamplingStatus sample() popped out at me. This PR fixed application wide contention on calls to RemoteControlledSampler.sample, but given PerOperationSampler's synchronized sample method, there is contention per operation. Did you end up seeing this as a result?

yborovikov · 2020-07-27T18:21:51Z

Due to the workaround on our side I lost the ability to "reproduce" the contention in production environment at that time and didn't (need to) run any further performance tests.
The fix to PerOperationSampler seemed to be obvious (yet exact gains would depend on concrete customer app patterns).

- Add volatile keyword to underlying samplers so updated references can be made available to other threads. See jaegertracing#609 for similar work. Fixes jaegertracing#807 Signed-off-by: Will Tran <will@autonomic.ai>

- Use a ConcurrentHashMap for operationNameToSampler so that any number of concurrent calls to sample can be made and safely modify it. - Add volatile to any fields that can be changed by the update method to ensure visibility of changes to other threads. - Retain instances of GuaranteedThroughputSampler to preserve their rate limit balances across updates when parameters don't change improving on jaegertracing/jaeger#1729 See jaegertracing#609 for similar work. Fixes jaegertracing#807 Signed-off-by: Will Tran <will@autonomic.ai>

Yegor Borovikov added 2 commits March 29, 2019 11:01

Fix #608 - multithreaded performance/synchronization issues

dd23215

This removes synchronization from RemoteControlledSampler and makes RateLimiter thread-safe. Signed-off-by: Yegor Borovikov <yegor@uber.com>

Revert RemoteControlledSamplerTest changes

5662114

Signed-off-by: Yegor Borovikov <yegor@uber.com>

yborovikov changed the title ~~Fix 608~~ Fix #608 Mar 29, 2019

yurishkuro requested a review from vprithvi March 30, 2019 19:31

jpkrohling changed the title ~~Fix #608~~ Concurrency improvements to RemoveControlledSampler Apr 1, 2019

Further optimize RateLimiter.checkCredit()

8474129

Signed-off-by: Yegor Borovikov <yegor@uber.com>

yborovikov changed the title ~~Concurrency improvements to RemoveControlledSampler~~ Concurrency improvements to RemoteControlledSampler Apr 2, 2019

Move clock.currentNanoTicks() inside optimistic retry loop

6f323c4

Signed-off-by: Yegor Borovikov <yegor@uber.com>

jpkrohling approved these changes Apr 3, 2019

View reviewed changes

jaeger-core/src/main/java/io/jaegertracing/internal/samplers/RemoteControlledSampler.java Outdated Show resolved Hide resolved

jaeger-core/src/test/java/io/jaegertracing/internal/utils/RateLimiterTest.java Outdated Show resolved Hide resolved

Log update failure; add timer health test

c8b347b

Signed-off-by: Yegor Borovikov <yegor@uber.com>

jpkrohling approved these changes Apr 4, 2019

View reviewed changes

pavolloffay mentioned this pull request Apr 26, 2019

Release 0.35.0 based on OpenTracing 0.32.0 #614

Closed

Merge branch 'master' into fix-608

709062f

pavolloffay merged commit b2a0e59 into jaegertracing:master Apr 29, 2019

yborovikov deleted the fix-608 branch April 29, 2019 14:38

pavolloffay mentioned this pull request May 10, 2019

Automatically generate changelog #629

Closed

william-tran mentioned this pull request Jul 29, 2020

Support delayed sampling #719

Closed

william-tran mentioned this pull request Jun 15, 2021

Drop Java 6 Compatibility #511

Closed

william-tran mentioned this pull request Oct 13, 2021

.sample() methods of PerOperationSampler and GuaranteedThroughputSampler are synchronized #807

Closed

william-tran mentioned this pull request Oct 14, 2021

Remove all remaining synchronized implementations of Sampler.sample #809

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concurrency improvements to RemoteControlledSampler #609

Concurrency improvements to RemoteControlledSampler #609

yborovikov commented Mar 29, 2019 •

edited by jpkrohling

Loading

codecov bot commented Mar 29, 2019 •

edited

Loading

yurishkuro commented Mar 30, 2019

yborovikov commented Mar 31, 2019

jpkrohling commented Apr 1, 2019

yborovikov commented Apr 2, 2019

yborovikov commented Apr 2, 2019

yborovikov commented Apr 2, 2019

jpkrohling left a comment

jpkrohling commented Apr 4, 2019

pavolloffay commented Apr 29, 2019

william-tran commented Jul 27, 2020

yborovikov commented Jul 27, 2020

Concurrency improvements to RemoteControlledSampler #609

Concurrency improvements to RemoteControlledSampler #609

Conversation

yborovikov commented Mar 29, 2019 • edited by jpkrohling Loading

Which problem is this PR solving?

Short description of the changes

codecov bot commented Mar 29, 2019 • edited Loading

Codecov Report

yurishkuro commented Mar 30, 2019

yborovikov commented Mar 31, 2019

jpkrohling commented Apr 1, 2019

yborovikov commented Apr 2, 2019

yborovikov commented Apr 2, 2019

yborovikov commented Apr 2, 2019

jpkrohling left a comment

Choose a reason for hiding this comment

jpkrohling commented Apr 4, 2019

pavolloffay commented Apr 29, 2019

william-tran commented Jul 27, 2020

yborovikov commented Jul 27, 2020

yborovikov commented Mar 29, 2019 •

edited by jpkrohling

Loading

codecov bot commented Mar 29, 2019 •

edited

Loading