-
Notifications
You must be signed in to change notification settings - Fork 834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CAS and voltile approach to fix delta concurrency bug #5976
CAS and voltile approach to fix delta concurrency bug #5976
Conversation
6e12111
to
5452481
Compare
...c/main/java/io/opentelemetry/sdk/metrics/internal/state/DefaultSynchronousMetricStorage.java
Show resolved
Hide resolved
5452481
to
7121615
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
... and 2 files with indirect coverage changes 📢 Thoughts on this report? Let us know! |
...c/main/java/io/opentelemetry/sdk/metrics/internal/state/DefaultSynchronousMetricStorage.java
Show resolved
Hide resolved
...c/main/java/io/opentelemetry/sdk/metrics/internal/state/DefaultSynchronousMetricStorage.java
Outdated
Show resolved
Hide resolved
...c/main/java/io/opentelemetry/sdk/metrics/internal/state/DefaultSynchronousMetricStorage.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice 👍
I think better comments and naming could make it easier to follow but that could be done in a follow-up.
@jack-berg I wrote that in my original comment design :)
but it doesn't matter since it's flawed as you wrote. Very nice catch! I haven't considered the option that the thread reading the The current proposed approach of Odd / even AtomicInteger has 2 issues:
I have a suggestion:
This solves the scenario in which you obtain a read lock on a "finished-write" holder (write has been unlocked). WDYT? |
I think there's improvements that could made to the approach for this, but that we're out of time. The release intended for Friday is 3 days late, and going through another round of development / performance testing / review would likely push it at least another day (based on the schedule I have). Our options are essentially: merge this PR or revert the original fix #5932 (no point including code which is both blocking and not correct all the time). I'm going to push some minor updates to improve the naming / comments, merge this PR, and cut the release. I'm happy to continue iterating on this for the December release. |
ugh, I mixed up |
The record threads are at most 2 times, but the collect thread could be more time since the loop is waiting for record threads to finish and decrement before continuing.
|
oh I mixed up |
After merging #5932, @trask pointed out that the approach isn't actually non-blocking because it doesn't use
readLock().tryLock()
and the sequence below can result in the call toreadLock().lock()
to block:I wrote a test and verifies this is indeed correct, so back to the drawing board. I adjusted the code to repeatedly try to read the volatile
AggregatorHolder
, with atryLock
as the loop condition. Something like:But after running the unit test I wrote to verify correctness 100 times, this occasionally fails. The problem with this code is that its subject to (essentially) the same sequence outlined above: record threads calls to
readLock().tryLock()
can succeed, but after a collect thread has unlocked itswriteLock()
, causing lost writes.After thinking about this more, I came up with a solution where I use AtomicInteger to track the number of outstanding record operations, incrementing and decrementing as they start and complete. The collect thread does a CAS loop waiting for 0 outstanding record operations, and setting the AtomicInteger to -1, which acts as a signal for record threads to re-read the volatile
AggregatorHolder
.The performance of this is better than the readWriteLock solution. And it appears to be correct in all cases (I re-ran the unit test 100 times without failure).