-
Notifications
You must be signed in to change notification settings - Fork 268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: safe batch write #838
Conversation
2df5955
to
e472c68
Compare
were you able to test the issue berachain ran into here? |
It should be possible to prove the theory in this PR as to the cause of |
@@ -1055,6 +1051,8 @@ func (tree *MutableTree) saveNewNodes(version int64) error { | |||
} | |||
|
|||
node._hash(version) | |||
newNodes = append(newNodes, node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this append
moved below the recursive node key create call now? I don't see how this should effect anything since we don't do anything with the newNodes
until the entire call stack returns anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is for the reverse ordering of newNode
to store the root at last
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thought is that a pre-order traversal produced list of newNodes
might fail part of the way through writing, leaving the root in the db but not the entire changeset?
Isn't the call the SaveRoot
what identifies a tree as saved? Otherwise how will an application load a tree at version N after a crash which only saved part of the batch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we assume the node with nonce 1 as the root, SaveRoot
is only used in case of nil
or no updates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok I see that now, makes sense since nonce is assigned by pre-order traversal.
mutable_tree.go
Outdated
@@ -733,6 +733,13 @@ func (tree *MutableTree) SaveVersion() ([]byte, int64, error) { | |||
} | |||
|
|||
tree.logger.Debug("SAVE TREE", "version", version) | |||
tree.ndb.resetLatestVersion(version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we move nodeDb version before saving the nodes might this expose a different kind of race condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
saveFastNodes requires the updated version, let me refactor it
The main reason of this issue is due to batch writing breaks, I have no idea of how to break the batch writing in test. |
@elias-orijtech @odeke-em |
@cool-develope I don't have any comments apart from what @kocubinski said about needing a test case to make sure this PR fixes what it claims. |
That's what I struggle now, could you help me add a test case? |
Thanks @cool-develope and @tac0turtle for the ping, I am working on helping out with the test for this. |
Admittedly, it's a bit hard to provide a thorough test case give we'd need to work with the internals of go leveldb batching. I think I'm OK merging this as-is; moving the creation of the batch from pre-order to post-order should move the entire SaveVersion batch to behave like a transaction. |
WalkthroughThe changes primarily focus on improving the handling of the fast storage version in the Changes
TipsChat with CodeRabbit Bot (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Files selected for processing (5)
- migrate_test.go (1 hunks)
- mutable_tree.go (6 hunks)
- nodedb.go (2 hunks)
- nodedb_test.go (7 hunks)
- tree_test.go (1 hunks)
Files skipped from review due to trivial changes (1)
- migrate_test.go
Additional comments (Suppressed): 15
nodedb.go (2)
247-253: The function
setFastStorageVersionToBatch
now takes an additional parameterlatestVersion
. This change improves the function's clarity and decouples it from the internal retrieval of the latest version. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.262-267: The retrieval of the
latestVersion
has been moved outside of thesetFastStorageVersionToBatch
function. This change improves the separation of concerns, as the function no longer needs to handle errors related to retrieving the latest version. However, it's important to ensure that thelatestVersion
is correctly retrieved and validated before being passed to this function.tree_test.go (1)
- 1783-1784: The changes in these lines indicate that the expected cache hit count has been increased from 0 to 1, and the expected cache miss count has been decreased from 1 to 0. This suggests that the cache behavior has been modified in the code under test. Please ensure that these changes are intentional and correctly reflect the new behavior of the cache. If the cache behavior has not been modified, these changes may indicate a problem in the test or the code under test.
mutable_tree.go (6)
618-625: The
setFastStorageVersionToBatch
function now takes an additional parameterlatestVersion
. This change improves code clarity and decouples the function from the internal retrieval of the latest version. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.742-747: The new code introduces a condition to save new fast nodes only if
skipFastStorageUpgrade
is false. This is a good practice as it prevents unnecessary operations when fast storage upgrade is skipped.775-779: The order of operations has been changed. The
resetLatestVersion
function is now called after theCommit
function. This change ensures that the latest version is reset only after the changes have been successfully committed, which improves the consistency of the data.793-800: The
saveFastNodeVersion
function now takes an additional parameterlatestVersion
. This change improves code clarity and decouples the function from the internal retrieval of the latest version. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.1042-1046: The line
newNodes = append(newNodes, node)
has been removed from this location. This change seems to be part of a refactoring to append the new node to thenewNodes
slice after hashing it, as seen in the next hunk.1059-1060: The line
newNodes = append(newNodes, node)
has been added here. This change is part of a refactoring to append the new node to thenewNodes
slice after hashing it. This ensures that the node is fully processed before being added to the slice, which improves the consistency of the data.nodedb_test.go (6)
94-99: The function
setFastStorageVersionToBatch
now takes an additional parameterlatestVersion
. Ensure that all calls to this function throughout the codebase have been updated to match the new signature. Also, verify that thelatestVersion
is correctly retrieved and passed to the function.119-122: The function
setFastStorageVersionToBatch
now takes an additional parameterexpectedFastCacheVersion
. Ensure that all calls to this function throughout the codebase have been updated to match the new signature. Also, verify that theexpectedFastCacheVersion
is correctly retrieved and passed to the function.140-143: The function
setFastStorageVersionToBatch
now takes an additional parameter0
. Ensure that all calls to this function throughout the codebase have been updated to match the new signature. Also, verify that the0
is correctly retrieved and passed to the function.152-155: The function
setFastStorageVersionToBatch
now takes an additional parameterndb.latestVersion
. Ensure that all calls to this function throughout the codebase have been updated to match the new signature. Also, verify that thendb.latestVersion
is correctly retrieved and passed to the function.166-169: The function
setFastStorageVersionToBatch
now takes an additional parameterndb.latestVersion
. Ensure that all calls to this function throughout the codebase have been updated to match the new signature. Also, verify that thendb.latestVersion
is correctly retrieved and passed to the function.180-188: The function
setFastStorageVersionToBatch
now takes an additional parameterndb.latestVersion
. Ensure that all calls to this function throughout the codebase have been updated to match the new signature. Also, verify that thendb.latestVersion
is correctly retrieved and passed to the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Files selected for processing (1)
- nodedb_test.go (8 hunks)
Additional comments (Suppressed): 7
nodedb_test.go (7)
94-99: The
setFastStorageVersionToBatch
function now takes an additional parameterlatestVersion
. Ensure that all calls to this function throughout the codebase have been updated to match the new signature. Also, verify that thelatestVersion
is correctly retrieved and passed to the function.119-122: The
setFastStorageVersionToBatch
function now takes an additional parameterexpectedFastCacheVersion
. Ensure that all calls to this function throughout the codebase have been updated to match the new signature. Also, verify that theexpectedFastCacheVersion
is correctly retrieved and passed to the function.140-143: The
setFastStorageVersionToBatch
function now takes an additional parameter0
. Ensure that all calls to this function throughout the codebase have been updated to match the new signature. Also, verify that the0
is correctly retrieved and passed to the function.152-155: The
setFastStorageVersionToBatch
function now takes an additional parameterndb.latestVersion
. Ensure that all calls to this function throughout the codebase have been updated to match the new signature. Also, verify that thendb.latestVersion
is correctly retrieved and passed to the function.166-169: The
setFastStorageVersionToBatch
function now takes an additional parameterndb.latestVersion
. Ensure that all calls to this function throughout the codebase have been updated to match the new signature. Also, verify that thendb.latestVersion
is correctly retrieved and passed to the function.180-188: The
setFastStorageVersionToBatch
function now takes an additional parameterndb.latestVersion
. Ensure that all calls to this function throughout the codebase have been updated to match the new signature. Also, verify that thendb.latestVersion
is correctly retrieved and passed to the function.405-408: The
setFastStorageVersionToBatch
function now takes an additional parameterndb.latestVersion
. Ensure that all calls to this function throughout the codebase have been updated to match the new signature. Also, verify that thendb.latestVersion
is correctly retrieved and passed to the function.
@Mergifyio backport release/v1.x.x |
✅ Backports have been created
|
(cherry picked from commit fa35c63)
Context
We faced with
value missing error
sometimes. I think it is related to recent works ofnodekey refactoring
andbatch flusher
.batch flusher
, we have one atomic batch write but now there can be several batch writes, so some batch data can be away when the app is in panic.nodekey refactoring
As a result, there may be a situation that has only a root node, not entire nodes storing.
Action
Refactored the storing nodes order to save the root at last
Summary by CodeRabbit
These changes are primarily under-the-hood improvements and should not affect the user experience directly. However, they lay the groundwork for more efficient data handling and future feature development.