Skip to content
This repository has been archived by the owner on Jul 1, 2022. It is now read-only.

Concurrency improvements to RemoteControlledSampler #609

Merged
merged 6 commits into from
Apr 29, 2019
Merged

Concurrency improvements to RemoteControlledSampler #609

merged 6 commits into from
Apr 29, 2019

Conversation

yborovikov
Copy link
Contributor

@yborovikov yborovikov commented Mar 29, 2019

This removes synchronization from RemoteControlledSampler and makes RateLimiter thread-safe.

Which problem is this PR solving?

As described in #608 Synchronization in RemoteControlledSampler was causing severe performance degradation in multi-threaded applications.

RateLimiter.checkBalance() implementation was not thread-safe and didn't correctly initialize lastTick, causing non-deterministic behavior for the first invocation of .checkBalance().

Short description of the changes

Synchronization removed from RemoteControlledSampler.

Internal state of RateLimiter is now stored in AtomicLongs, with .checkBalance() modified to use an approach similar to AtomicLong.getAndAccumulate() - to prevent internal state corruption in concurrent flows.

Fixes #608

Yegor Borovikov added 2 commits March 29, 2019 11:01
This removes synchronization from RemoteControlledSampler
and makes RateLimiter thread-safe.

Signed-off-by: Yegor Borovikov <yegor@uber.com>
Signed-off-by: Yegor Borovikov <yegor@uber.com>
@codecov
Copy link

codecov bot commented Mar 29, 2019

Codecov Report

Merging #609 into master will increase coverage by 0.38%.
The diff coverage is 90.32%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #609      +/-   ##
============================================
+ Coverage     89.27%   89.65%   +0.38%     
- Complexity      544      545       +1     
============================================
  Files            68       68              
  Lines          1958     1953       -5     
  Branches        251      253       +2     
============================================
+ Hits           1748     1751       +3     
+ Misses          133      125       -8     
  Partials         77       77
Impacted Files Coverage Δ Complexity Δ
...ing/internal/samplers/RemoteControlledSampler.java 91.25% <88.88%> (+1.71%) 16 <5> (-1) ⬇️
...a/io/jaegertracing/internal/utils/RateLimiter.java 94.73% <92.3%> (-5.27%) 5 <3> (ø)
...egertracing/internal/reporters/RemoteReporter.java 84.88% <0%> (+2.32%) 7% <0%> (ø) ⬇️
...gertracing/internal/reporters/LoggingReporter.java 90.9% <0%> (+9.09%) 5% <0%> (+1%) ⬆️
...rtracing/internal/reporters/CompositeReporter.java 100% <0%> (+28.57%) 7% <0%> (+1%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 20ee4e8...709062f. Read the comment docs.

@yborovikov yborovikov changed the title Fix 608 Fix #608 Mar 29, 2019
@yurishkuro yurishkuro requested a review from vprithvi March 30, 2019 19:31
@yurishkuro
Copy link
Member

This is a tricky change, I haven't spent enough time reviewing it yet. High level impressions though:

  • The rate limiter is being made thread-safe by this PR, we should reflect it in the class javadoc.
  • The way it is being made thread-safe (via atomic longs) is tricky, I had a hard time following whether it introduces more race conditions. It would be useful to see a benchmark of whether atomic longs are actually saving time vs. just synchronizing the rate limiter update methods.
  • The change to the remote sampler makes a major assumption that the update methods are thread-safe on every possible sampler implementation. It's not obvious at all if that's the case. Also, even if update() methods are thread-safe, it seems like the equals() method now becomes racy.

I would like to see documentation added to the expectation of thread-safety on different samplers (this was not a requirement previously). Alternatively, we should consider if pushing thread-safety to individual samplers is the right approach. The alternative could be a "copy on write" implementation of the top-level adaptiveSampler.update() method where each individual sampler is asked to update() itself to new parameters, and it internally decides whether to return this (in case of no changes) or a new instance. However, this approach still has race conditions for rate limiting samplers which might be concurrently asked isSampled() and update their internal state.

@yborovikov
Copy link
Contributor Author

Thank you for reviewing the change!

  • The rate limiter is being made thread-safe by this PR, we should reflect it in the class javadoc.

Are there any classes in Jaeger that are not expected to be thread-safe? If so, I'd suggest marking those as such, not the other way around. Using @javax.annotation.concurrent.[Not]ThreadSafe might be a good way to express these contracts.

  • The way it is being made thread-safe (via atomic longs) is tricky, I had a hard time following whether it introduces more race conditions. It would be useful to see a benchmark of whether atomic longs are actually saving time vs. just synchronizing the rate limiter update methods.

Most of the trickery comes from the need to convert double arithmetic to integer-based. I'd suggest changing the API to use integer quotas. Given it's an internal implementation class and the only use case is .checkCredit(1.0) that should be easy. But not strictly necessary.

Another bit than might feel tricky is a standard optimistic locking over CAS pattern, very similar to Java 8+'s AtomicLong.accumulateAndGet() implementation (as mentioned in the PR).

The exising (and added concurrent) unit tests confirm the correcness of the behavior is preserved.

Concurrent performance wise, rule of thumb would be for volatile to be typically faster than synchronized - fairly understandable when one takes the respective granularity of happens-before guarantees. There's plenty of articles and benchmarks on the topic, here's one of the first ones - from 2004 by Brian Goetz.

Of course, any specific use case can be unique (and the real comparisons should be derived from observing the real-life, production behavior). Anyway, here's some JMH results I got from benchmarking synchronized and atomic implementations of RateLimiter (8 threads, ~10% time spent in .checkCredit()):

Benchmark                             Mode  Cnt         Score        Error  Units
SynchronizedRateLimiter.baseline     thrpt  100  17299664.806 ± 250785.751  ops/s
SynchronizedRateLimiter.checkCredit  thrpt  100   4743227.226 ± 141156.036  ops/s
AtomicRateLimiter.checkCredit        thrpt  100  13088194.369 ± 154981.298  ops/s

.baseline is original (no locking, not thread-safe) implementation - for comparison.
This is, obviously, still a synthetic benchmark - and, personally, I expect way more of a difference in a synchronized .checkCredit() performance in an efficient concurrent application.

  • The change to the remote sampler makes a major assumption that the update methods are thread-safe on every possible sampler implementation. It's not obvious at all if that's the case. Also, even if update() methods are thread-safe, it seems like the equals() method now becomes racy.

I checked every implementation in the project (that's how I came to fixing RateLimiter in this PR). There are some improvements that can be made (e.g. switching PerOperationSampler from syncronized to using ConcurrentHashMap) but overall it looks clean. Adding concurrent behavior tests sounds like a reasonable next step.

Same applies to .equals() - granted, I do not fully understand the logic/purpose behind some of its implementations. Anyway, .equals() on a mutable object is not guaranteed to return the same result for the same argument over time.

I would like to see documentation added to the expectation of thread-safety on different samplers (this was not a requirement previously).

I believe it was - given that many of the samplers (e.g. RateLimitingSampler) are instantiated by Configuration and are directly wired into Tracer - there seems to be no expectation that every sampler will be wrapped by a RemoteControlledSampler (to guarantee thread-safety?).

Alternatively, we should consider if pushing thread-safety to individual samplers is the right approach.

Yes, absolutely.

The alternative could be a "copy on write" implementation of the top-level adaptiveSampler.update() method where each individual sampler is asked to update() itself to new parameters, and it internally decides whether to return this (in case of no changes) or a new instance. However, this approach still has race conditions for rate limiting samplers which might be concurrently asked isSampled() and update their internal state.

Thanks again for your review, comments, and questions!

@jpkrohling
Copy link
Collaborator

LGTM, but I wonder if this is sufficient to fix the behavior you were seeing as part of #608. Could you share a memory profile similar to what you have in the original issue? This would help evaluate the gains from this PR.

Additionally, would you be able to publish the JMH tests in a public repository? This would be very useful for us in the future.

@jpkrohling jpkrohling changed the title Fix #608 Concurrency improvements to RemoveControlledSampler Apr 1, 2019
@yborovikov
Copy link
Contributor Author

I updated RateLimiter with a simpler implementation that uses a single AtomicLong to track spent (and wasted - due to exceeding max balance) credits (in nano ticks). Hopefully, it's easier to comprehend, too.

This approach yields performance on par with the baseline (or, possibly, better - due to less floating point operations and main memory writes):

Benchmark                             Mode  Cnt         Score        Error  Units
SynchronizedRateLimiter.baseline     thrpt  100  16462034.465 ± 402504.035  ops/s
SynchronizedRateLimiter.checkCredit  thrpt  100   4744500.578 ±  88917.828  ops/s
AtomicRateLimiter.checkCredit        thrpt  100  13292796.354 ± 313919.827  ops/s
AtomicRateLimiter1.checkCredit       thrpt  100  17130924.009 ± 238918.453  ops/s

@yborovikov
Copy link
Contributor Author

LGTM, but I wonder if this is sufficient to fix the behavior you were seeing as part of #608. Could you share a memory profile similar to what you have in the original issue? This would help evaluate the gains from this PR.

I updated our internal code to work around #608 by moving all (or most of) the sampling to a single thread. This resulted in much healthier threads:
sample-fixed
Arguably, getting rid of the synchronized bottleneck should produce a similar effect (except that PerOperationSampler is still heavily synchronized...) - but I haven't tried it in a real (production) environment.

Additionally, would you be able to publish the JMH tests in a public repository? This would be very useful for us in the future.

Let me see if I can remove internal build scaffolding from my JMH project - this should be fairly easy.

Signed-off-by: Yegor Borovikov <yegor@uber.com>
@yborovikov yborovikov changed the title Concurrency improvements to RemoveControlledSampler Concurrency improvements to RemoteControlledSampler Apr 2, 2019
Signed-off-by: Yegor Borovikov <yegor@uber.com>
@yborovikov
Copy link
Contributor Author

I ran the RateLimiter benchmark (with 100% time spent in .checkCredit()) on a 32(logical)-core Linux server for different numbers of threads.
Here's the resulting graph (vertical axis: ops/s, log scale; horizontal axis: number of threads, log scale):
32-core-linux-benchmark

  • baseline (green) is the original (not thread-safe) implementation.
  • _synchronized (blue) is the original with synchronization added.
  • atomic (orange) is using a single AtomicLong to store the state (this PR).

I'm fairly satisfied with the state of this PR and don't have any further changes planned.

Please take some time to review the code and let me know if something needs to be fixed.

Copy link
Collaborator

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I was a bit uneasy at first with the RemoteControllerSampler, but given that the TimerTask is the only place changing the sampler, volatile does look appropriate.

Signed-off-by: Yegor Borovikov <yegor@uber.com>
@jpkrohling
Copy link
Collaborator

@yurishkuro are there any remaining concerns about this one?

@pavolloffay
Copy link
Member

I have talked to @jpkrohling that we should merge this PR. It's been a long time since it was approved.

@pavolloffay pavolloffay merged commit b2a0e59 into jaegertracing:master Apr 29, 2019
@yborovikov yborovikov deleted the fix-608 branch April 29, 2019 14:38
@william-tran
Copy link

@yborovikov thanks for this.

There are some improvements that can be made (e.g. switching PerOperationSampler from syncronized to using ConcurrentHashMap) but overall it looks clean. Adding concurrent behavior tests sounds like a reasonable next step.

I was looking at the PerOperationSampler code and the public synchronized SamplingStatus sample() popped out at me. This PR fixed application wide contention on calls to RemoteControlledSampler.sample, but given PerOperationSampler's synchronized sample method, there is contention per operation. Did you end up seeing this as a result?

@yborovikov
Copy link
Contributor Author

Due to the workaround on our side I lost the ability to "reproduce" the contention in production environment at that time and didn't (need to) run any further performance tests.
The fix to PerOperationSampler seemed to be obvious (yet exact gains would depend on concrete customer app patterns).

william-tran pushed a commit to autonomic-ai/jaeger-client-java that referenced this pull request Oct 14, 2021
- Add volatile keyword to underlying samplers so updated references can
be made available to other threads.

See jaegertracing#609 for
similar work.

Fixes jaegertracing#807

Signed-off-by: Will Tran <will@autonomic.ai>
william-tran pushed a commit to autonomic-ai/jaeger-client-java that referenced this pull request Oct 14, 2021
- Use a ConcurrentHashMap for operationNameToSampler so that any number
of concurrent calls to sample can be made and safely modify it.
- Add volatile to any fields that can be changed by the update method to
ensure visibility of changes to other threads.
- Retain instances of GuaranteedThroughputSampler to preserve their rate
limit balances across updates when parameters don't change improving on
jaegertracing/jaeger#1729

See jaegertracing#609 for
similar work.

Fixes jaegertracing#807

Signed-off-by: Will Tran <will@autonomic.ai>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RemoteControlledSampler.sample() is a bottleneck
5 participants