Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: safe batch write #838

Merged
merged 7 commits into from
Oct 20, 2023
Merged

fix: safe batch write #838

merged 7 commits into from
Oct 20, 2023

Conversation

cool-develope
Copy link
Collaborator

@cool-develope cool-develope commented Sep 20, 2023

Context

We faced with value missing error sometimes. I think it is related to recent works of nodekey refactoring and batch flusher.

  • Before batch flusher, we have one atomic batch write but now there can be several batch writes, so some batch data can be away when the app is in panic.
  • Originally, we stored the root in the last order but now it is stored at first after nodekey refactoring

As a result, there may be a situation that has only a root node, not entire nodes storing.

Action

Refactored the storing nodes order to save the root at last

Summary by CodeRabbit

  • Refactor: Updated the handling of 'fast storage' in our system to improve code clarity and efficiency. This includes changes to how we save new nodes and manage versions in our 'MutableTree' and 'nodeDB' structures.
  • Test: Adjusted several test cases to align with the updated code structure and behavior. This includes changes to 'TestLegacyReferenceNode', 'TestNodeCacheStatisic', and several tests in 'nodedb_test.go'.

These changes are primarily under-the-hood improvements and should not affect the user experience directly. However, they lay the groundwork for more efficient data handling and future feature development.

@cool-develope cool-develope requested a review from a team as a code owner September 20, 2023 14:12
@cool-develope cool-develope changed the title fix the safe batch write fix: safe batch write Sep 20, 2023
@tac0turtle
Copy link
Member

were you able to test the issue berachain ran into here?

@kocubinski
Copy link
Member

It should be possible to prove the theory in this PR as to the cause of value missing error. Can you create a failing test case which is then fixed by this patch?

@@ -1055,6 +1051,8 @@ func (tree *MutableTree) saveNewNodes(version int64) error {
}

node._hash(version)
newNodes = append(newNodes, node)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this append moved below the recursive node key create call now? I don't see how this should effect anything since we don't do anything with the newNodes until the entire call stack returns anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is for the reverse ordering of newNode to store the root at last

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thought is that a pre-order traversal produced list of newNodes might fail part of the way through writing, leaving the root in the db but not the entire changeset?

Isn't the call the SaveRoot what identifies a tree as saved? Otherwise how will an application load a tree at version N after a crash which only saved part of the batch?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we assume the node with nonce 1 as the root, SaveRoot is only used in case of nil or no updates

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok I see that now, makes sense since nonce is assigned by pre-order traversal.

mutable_tree.go Outdated
@@ -733,6 +733,13 @@ func (tree *MutableTree) SaveVersion() ([]byte, int64, error) {
}

tree.logger.Debug("SAVE TREE", "version", version)
tree.ndb.resetLatestVersion(version)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we move nodeDb version before saving the nodes might this expose a different kind of race condition?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

saveFastNodes requires the updated version, let me refactor it

@cool-develope
Copy link
Collaborator Author

cool-develope commented Sep 25, 2023

It should be possible to prove the theory in this PR as to the cause of value missing error. Can you create a failing test case which is then fixed by this patch?

The main reason of this issue is due to batch writing breaks, I have no idea of how to break the batch writing in test.
To simulate the above issue, we should break the batch writing while commit, and re-load the tree. Do you have any idea?

@cool-develope
Copy link
Collaborator Author

@elias-orijtech @odeke-em
This is a PR that was mentioned before. I struggle to add test cases since the issue concerns low-level API (batch writing).
The failure case is triggered by loading the broken tree. To break the tree state, the app should panic while writing the batch.

@elias-orijtech
Copy link
Contributor

@cool-develope I don't have any comments apart from what @kocubinski said about needing a test case to make sure this PR fixes what it claims.

@cool-develope
Copy link
Collaborator Author

@cool-develope I don't have any comments apart from what @kocubinski said about needing a test case to make sure this PR fixes what it claims.

That's what I struggle now, could you help me add a test case?

@odeke-em
Copy link
Contributor

Thanks @cool-develope and @tac0turtle for the ping, I am working on helping out with the test for this.

@kocubinski
Copy link
Member

Admittedly, it's a bit hard to provide a thorough test case give we'd need to work with the internals of go leveldb batching. I think I'm OK merging this as-is; moving the creation of the batch from pre-order to post-order should move the entire SaveVersion batch to behave like a transaction.

@tac0turtle tac0turtle enabled auto-merge (squash) October 19, 2023 15:38
@coderabbitai
Copy link

coderabbitai bot commented Oct 19, 2023

Walkthrough

The changes primarily focus on improving the handling of the fast storage version in the MutableTree and nodeDB structures. The latest version is now passed as an argument to relevant methods, enhancing code clarity and decoupling. Test cases have been updated to reflect these changes.

Changes

File(s) Summary
migrate_test.go Updated legacyVersion variable in TestLegacyReferenceNode function.
mutable_tree.go Enhanced handling of fast storage version in MutableTree struct and its methods.
nodedb.go, nodedb_test.go Refactored setFastStorageVersionToBatch function in nodeDB struct to take latestVersion as an argument. Updated test cases accordingly.
tree_test.go Updated expected values in TestNodeCacheStatisic test case.

🐇💻

In the land of code, where logic is king,

Changes were made, improvements they bring.

Fast storage version, handled with care,

Tests updated too, fair and square.

Celebrate the changes, for they are fine,

In the world of code, where we rabbits shine. 🎉🥕


Tips

Chat with CodeRabbit Bot (@coderabbitai)

  • Mention @coderabbitai in any review comment for bot assistance.
  • Note: Review comments are made on code diffs or files, not on the PR overview.

Pause Incremental Reviews

  • Insert @coderabbitai: ignore in the PR description to halt the bot's ongoing reviews. Remove the line to resume.
  • Additionally, enter @coderabbitai pause to halt the bot's ongoing reviews as any issue or review comment. To
    resume reviews, enter @coderabbitai resume as any issue or review comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Commits Files that changed from the base of the PR and between 10c9f1e and c6035f9.
Files selected for processing (5)
  • migrate_test.go (1 hunks)
  • mutable_tree.go (6 hunks)
  • nodedb.go (2 hunks)
  • nodedb_test.go (7 hunks)
  • tree_test.go (1 hunks)
Files skipped from review due to trivial changes (1)
  • migrate_test.go
Additional comments (Suppressed): 15
nodedb.go (2)
  • 247-253: The function setFastStorageVersionToBatch now takes an additional parameter latestVersion. This change improves the function's clarity and decouples it from the internal retrieval of the latest version. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.

  • 262-267: The retrieval of the latestVersion has been moved outside of the setFastStorageVersionToBatch function. This change improves the separation of concerns, as the function no longer needs to handle errors related to retrieving the latest version. However, it's important to ensure that the latestVersion is correctly retrieved and validated before being passed to this function.

tree_test.go (1)
  • 1783-1784: The changes in these lines indicate that the expected cache hit count has been increased from 0 to 1, and the expected cache miss count has been decreased from 1 to 0. This suggests that the cache behavior has been modified in the code under test. Please ensure that these changes are intentional and correctly reflect the new behavior of the cache. If the cache behavior has not been modified, these changes may indicate a problem in the test or the code under test.
mutable_tree.go (6)
  • 618-625: The setFastStorageVersionToBatch function now takes an additional parameter latestVersion. This change improves code clarity and decouples the function from the internal retrieval of the latest version. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.

  • 742-747: The new code introduces a condition to save new fast nodes only if skipFastStorageUpgrade is false. This is a good practice as it prevents unnecessary operations when fast storage upgrade is skipped.

  • 775-779: The order of operations has been changed. The resetLatestVersion function is now called after the Commit function. This change ensures that the latest version is reset only after the changes have been successfully committed, which improves the consistency of the data.

  • 793-800: The saveFastNodeVersion function now takes an additional parameter latestVersion. This change improves code clarity and decouples the function from the internal retrieval of the latest version. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.

  • 1042-1046: The line newNodes = append(newNodes, node) has been removed from this location. This change seems to be part of a refactoring to append the new node to the newNodes slice after hashing it, as seen in the next hunk.

  • 1059-1060: The line newNodes = append(newNodes, node) has been added here. This change is part of a refactoring to append the new node to the newNodes slice after hashing it. This ensures that the node is fully processed before being added to the slice, which improves the consistency of the data.

nodedb_test.go (6)
  • 94-99: The function setFastStorageVersionToBatch now takes an additional parameter latestVersion. Ensure that all calls to this function throughout the codebase have been updated to match the new signature. Also, verify that the latestVersion is correctly retrieved and passed to the function.

  • 119-122: The function setFastStorageVersionToBatch now takes an additional parameter expectedFastCacheVersion. Ensure that all calls to this function throughout the codebase have been updated to match the new signature. Also, verify that the expectedFastCacheVersion is correctly retrieved and passed to the function.

  • 140-143: The function setFastStorageVersionToBatch now takes an additional parameter 0. Ensure that all calls to this function throughout the codebase have been updated to match the new signature. Also, verify that the 0 is correctly retrieved and passed to the function.

  • 152-155: The function setFastStorageVersionToBatch now takes an additional parameter ndb.latestVersion. Ensure that all calls to this function throughout the codebase have been updated to match the new signature. Also, verify that the ndb.latestVersion is correctly retrieved and passed to the function.

  • 166-169: The function setFastStorageVersionToBatch now takes an additional parameter ndb.latestVersion. Ensure that all calls to this function throughout the codebase have been updated to match the new signature. Also, verify that the ndb.latestVersion is correctly retrieved and passed to the function.

  • 180-188: The function setFastStorageVersionToBatch now takes an additional parameter ndb.latestVersion. Ensure that all calls to this function throughout the codebase have been updated to match the new signature. Also, verify that the ndb.latestVersion is correctly retrieved and passed to the function.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Commits Files that changed from the base of the PR and between c6035f9 and 57aa5ee.
Files selected for processing (1)
  • nodedb_test.go (8 hunks)
Additional comments (Suppressed): 7
nodedb_test.go (7)
  • 94-99: The setFastStorageVersionToBatch function now takes an additional parameter latestVersion. Ensure that all calls to this function throughout the codebase have been updated to match the new signature. Also, verify that the latestVersion is correctly retrieved and passed to the function.

  • 119-122: The setFastStorageVersionToBatch function now takes an additional parameter expectedFastCacheVersion. Ensure that all calls to this function throughout the codebase have been updated to match the new signature. Also, verify that the expectedFastCacheVersion is correctly retrieved and passed to the function.

  • 140-143: The setFastStorageVersionToBatch function now takes an additional parameter 0. Ensure that all calls to this function throughout the codebase have been updated to match the new signature. Also, verify that the 0 is correctly retrieved and passed to the function.

  • 152-155: The setFastStorageVersionToBatch function now takes an additional parameter ndb.latestVersion. Ensure that all calls to this function throughout the codebase have been updated to match the new signature. Also, verify that the ndb.latestVersion is correctly retrieved and passed to the function.

  • 166-169: The setFastStorageVersionToBatch function now takes an additional parameter ndb.latestVersion. Ensure that all calls to this function throughout the codebase have been updated to match the new signature. Also, verify that the ndb.latestVersion is correctly retrieved and passed to the function.

  • 180-188: The setFastStorageVersionToBatch function now takes an additional parameter ndb.latestVersion. Ensure that all calls to this function throughout the codebase have been updated to match the new signature. Also, verify that the ndb.latestVersion is correctly retrieved and passed to the function.

  • 405-408: The setFastStorageVersionToBatch function now takes an additional parameter ndb.latestVersion. Ensure that all calls to this function throughout the codebase have been updated to match the new signature. Also, verify that the ndb.latestVersion is correctly retrieved and passed to the function.

@tac0turtle tac0turtle merged commit fa35c63 into master Oct 20, 2023
8 checks passed
@tac0turtle tac0turtle deleted the fix/safe_batch branch October 20, 2023 11:14
@tac0turtle
Copy link
Member

@Mergifyio backport release/v1.x.x

@mergify
Copy link
Contributor

mergify bot commented Oct 20, 2023

backport release/v1.x.x

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Oct 20, 2023
(cherry picked from commit fa35c63)
tac0turtle pushed a commit that referenced this pull request Oct 20, 2023
Co-authored-by: cool-developer <51834436+cool-develope@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants