-
Notifications
You must be signed in to change notification settings - Fork 232
Concurrency improvements to RemoteControlledSampler #609
Conversation
This removes synchronization from RemoteControlledSampler and makes RateLimiter thread-safe. Signed-off-by: Yegor Borovikov <yegor@uber.com>
Signed-off-by: Yegor Borovikov <yegor@uber.com>
Codecov Report
@@ Coverage Diff @@
## master #609 +/- ##
============================================
+ Coverage 89.27% 89.65% +0.38%
- Complexity 544 545 +1
============================================
Files 68 68
Lines 1958 1953 -5
Branches 251 253 +2
============================================
+ Hits 1748 1751 +3
+ Misses 133 125 -8
Partials 77 77
Continue to review full report at Codecov.
|
This is a tricky change, I haven't spent enough time reviewing it yet. High level impressions though:
I would like to see documentation added to the expectation of thread-safety on different samplers (this was not a requirement previously). Alternatively, we should consider if pushing thread-safety to individual samplers is the right approach. The alternative could be a "copy on write" implementation of the top-level adaptiveSampler.update() method where each individual sampler is asked to update() itself to new parameters, and it internally decides whether to return |
Thank you for reviewing the change!
Are there any classes in Jaeger that are not expected to be thread-safe? If so, I'd suggest marking those as such, not the other way around. Using
Most of the trickery comes from the need to convert double arithmetic to integer-based. I'd suggest changing the API to use integer quotas. Given it's an internal implementation class and the only use case is Another bit than might feel tricky is a standard optimistic locking over CAS pattern, very similar to Java 8+'s The exising (and added concurrent) unit tests confirm the correcness of the behavior is preserved. Concurrent performance wise, rule of thumb would be for Of course, any specific use case can be unique (and the real comparisons should be derived from observing the real-life, production behavior). Anyway, here's some JMH results I got from benchmarking synchronized and atomic implementations of
I checked every implementation in the project (that's how I came to fixing Same applies to
I believe it was - given that many of the samplers (e.g.
Yes, absolutely.
Thanks again for your review, comments, and questions! |
LGTM, but I wonder if this is sufficient to fix the behavior you were seeing as part of #608. Could you share a memory profile similar to what you have in the original issue? This would help evaluate the gains from this PR. Additionally, would you be able to publish the JMH tests in a public repository? This would be very useful for us in the future. |
I updated This approach yields performance on par with the baseline (or, possibly, better - due to less floating point operations and main memory writes):
|
I updated our internal code to work around #608 by moving all (or most of) the sampling to a single thread. This resulted in much healthier threads:
Let me see if I can remove internal build scaffolding from my JMH project - this should be fairly easy. |
Signed-off-by: Yegor Borovikov <yegor@uber.com>
Signed-off-by: Yegor Borovikov <yegor@uber.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I was a bit uneasy at first with the RemoteControllerSampler
, but given that the TimerTask
is the only place changing the sampler, volatile does look appropriate.
jaeger-core/src/main/java/io/jaegertracing/internal/samplers/RemoteControlledSampler.java
Outdated
Show resolved
Hide resolved
jaeger-core/src/test/java/io/jaegertracing/internal/utils/RateLimiterTest.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Yegor Borovikov <yegor@uber.com>
@yurishkuro are there any remaining concerns about this one? |
I have talked to @jpkrohling that we should merge this PR. It's been a long time since it was approved. |
@yborovikov thanks for this.
I was looking at the PerOperationSampler code and the |
Due to the workaround on our side I lost the ability to "reproduce" the contention in production environment at that time and didn't (need to) run any further performance tests. |
- Add volatile keyword to underlying samplers so updated references can be made available to other threads. See jaegertracing#609 for similar work. Fixes jaegertracing#807 Signed-off-by: Will Tran <will@autonomic.ai>
- Use a ConcurrentHashMap for operationNameToSampler so that any number of concurrent calls to sample can be made and safely modify it. - Add volatile to any fields that can be changed by the update method to ensure visibility of changes to other threads. - Retain instances of GuaranteedThroughputSampler to preserve their rate limit balances across updates when parameters don't change improving on jaegertracing/jaeger#1729 See jaegertracing#609 for similar work. Fixes jaegertracing#807 Signed-off-by: Will Tran <will@autonomic.ai>
This removes synchronization from RemoteControlledSampler and makes RateLimiter thread-safe.
Which problem is this PR solving?
As described in #608 Synchronization in
RemoteControlledSampler
was causing severe performance degradation in multi-threaded applications.RateLimiter.checkBalance()
implementation was not thread-safe and didn't correctly initializelastTick
, causing non-deterministic behavior for the first invocation of.checkBalance()
.Short description of the changes
Synchronization removed from
RemoteControlledSampler
.Internal state of
RateLimiter
is now stored inAtomicLong
s, with.checkBalance()
modified to use an approach similar toAtomicLong.getAndAccumulate()
- to prevent internal state corruption in concurrent flows.Fixes #608