fix: Fix deadlock in the transaction code. #956
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The deadlock happenned during the brpop flow where we access shard_data.local_data from both coordinator and shard threads. Originally, shard_data.local_data was not designed for concurrent access so I used ARMED bit to deduplicate callback runs for each shard. The problem is that with BRPOP it could happen, the ExecuteAsync would do "=| ARMED" and in parallel NotifySuspended would apply " |= AWAKED" in the shard thread, and both R/M/W operations would corrupt each other.
Therefore, I separated now completely shard-local local_data mask and is_armed boolean. Moreover, since now we use atomics for is_armed, I increased PerShardData size to 64 bytes to avoid false cache sharding betweenn PerShardData objects.
Fixes #945