fix cluster: migration traverse bug #4279
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The bug:
Dash table traverse function assumes that the cb does not preempts
the traverse applies the cb on each slot of the logical bucket.
In our flow the registered cb WriteBucket can preempt and therefore after preempting the assertion for iterating each slot of the bucket is triggered
F20241208 21:35:59.052592 21717 init.cc:25] [../src/core/dash_internal.h:1587]: assert(BucketIndex(hfun(bucket->key[slot])) == bid) failed!
the bug was discovered in the following regression run
https://github.com/dragonflydb/dragonfly/actions/runs/12225241118/job/34098992760
The fix:
We lately introduced an new dash table function TraveseBucket that iterate on physical bucket and it applies the cb on the bucket only once. It is used in snapshoting now to support big value serialization wich can preempt and it also improves the flow as it applies the cb only once on a bucket and not multiple times for each bucket slot.