-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Purges all bank snapshots after fastboot #35350
Purges all bank snapshots after fastboot #35350
Conversation
Backports to the stable branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. |
Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #35350 +/- ##
=========================================
- Coverage 81.7% 81.7% -0.1%
=========================================
Files 834 834
Lines 224235 224236 +1
=========================================
- Hits 183390 183367 -23
- Misses 40845 40869 +24 |
@apfitzge Requesting your review since the PR is fastboot-related, and you've reviewed all the fastboot code Also note I intend to backport this PR, so also consider if there's anything that should be done differently now that'll make the backporting better. |
@brooksprumo can you please link the pr that adds |
I wasn't originally intending to backport #35291. Instead I was planning on using the previous way to purge all bank snapshots: snapshot_utils::purge_old_bank_snapshots(&snapshot_config.bank_snapshots_dir, 0, None); Would backporting #35291 be preferred? My though process was the backport the least amount of code possible. |
so the backport pr from this pr into 1.18 would be manually adjusted so it compiles in 1.18? |
Yes, that's what I was thinking. |
ugh. another approach is to submit what will compile in 1.18 in THIS pr and then backport that, then in a follow on master pr, change this code to use Another alternative is don't backport anything, at least not to 1.17. any other opinions? Am I being too pedantic? |
Yeah, I considered this too. My thought was that the actual code change is a single line, so it will be very simple to inspect the backports for correctness; that they indeed are purging all bank snapshots. I can adopt this approach if it's preferred though.
Yes, that's true. However it feels dangerous to leave a known issue like this. One that requires manual intervention from a node operator in order for the node to startup. |
Done in 3e42432. I've changed this PR to use
Per offline discussions, we will not backport to v1.17; only v1.18. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. ty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - easy backport.
(cherry picked from commit bdc5cce)
Oops, thought I gave this one a ship-it; glad you didn't wait but no concerns from my end either + LGTM |
Thanks! For the merge to master I felt comfortable with two approvals. For the backport I'll wait for everyones 😸 |
use fd_bs58 fix b58
Problem
Given the following scenario:
shrink
has run (e.g. shrink proceeds up to slot 275)Then the result is:
shrink
, the account storage files on disk will (may) have changed, and reflect the account state as of slot 275If the node has a script to auto-restart, it will enter a boot-crash loop indefinitely.
Summary of Changes
To break out of the boot-crash loop, purge all bank snapshots after loading from one. In the above scenario, this would cause the node to load from a snapshot archive instead, which is safe. If the node crashes after creating another bank snapshot, then fastboot will work properly again in that situation.
Fixes #35190
More Info
This problem has been hit on v1.17, so I want a solution that can be backported to v1.17. The implemented solution is the least invasive—and safest—one I am aware of. A different solution to allow successfully reusing the bank snapshot from slot 200 will land in
master
and not be backported.