Construct bank from snapshot dir #30171

xiangzhu70 · 2023-02-07T19:07:47Z

Problem

To boot faster, we want to boot from a bank snapshot directory, instead of an archive. This requires getting meta information from the snapshot directories indicating the snapshot versions and the completion statuses, to help identify a good directory to boot from.

Summary of Changes

Add the function bank_from_snapshot_dir() and all the supporting functions and data structures, to construct a bank from a bank snapshot directory.
Add the test function.

This is a split PR of #28745. It is the next step after #30099, which completes the bank snapshot directory.

Testing done:
On mainnet, snapshot untar took 92.4s
When the snapshot directory is used, the archive untar is skipped, so the deserialization is reduced to 5.5s

Fixes #

xiangzhu70 · 2023-02-07T19:09:29Z

This will be rebased after PR #30099 is committed, so the reduced changes can be ready for the review.

apfitzge

Largely happy with this.
I think we had it in the original PR (pre-split) description, but could you add what the time difference is between using snapshot from archive vs from dir?

apfitzge · 2023-03-06T19:44:35Z

runtime/src/serde_snapshot.rs

+    assert!(
+        next_append_vec_id.checked_sub(1).is_some(),
+        "subtraction underflow"
+    );
    let max_append_vec_id = next_append_vec_id - 1;


Is there some new case that requires we check this? If so, why not something like this?

Suggested change

assert!(

next_append_vec_id.checked_sub(1).is_some(),

"subtraction underflow"

);

let max_append_vec_id = next_append_vec_id - 1;

let max_append_vec_id = next_append_vec_id.checked_sub(1).unwrap();

I think based on the unwrap panic it should be obvious, no real need (imo) to specify "subtraction underflow".

agreed. updated it.

apfitzge · 2023-03-06T19:51:13Z

runtime/src/snapshot_utils.rs

+        .unwrap();
+
+    for account_path in account_paths {
+        for file in fs::read_dir(account_path).unwrap() {


by the time we're here, we have already verified that these account paths exist, right?

Yes, they went through the step to append "run/", and the run sub directories were created.

Please avoid unwrap outside of tests. An expect with a message is an option, or, this function could return a Result and the caller could put a single expect on its invocation.

removed all unwraps.

apfitzge · 2023-03-06T20:04:35Z

runtime/src/snapshot_utils/snapshot_storage_rebuilder.rs

+                let _ = &self
+                    .next_append_vec_id
+                    .fetch_max(next_appendvec_id as u32, Ordering::Relaxed);
+            }
            let (slot, slot_complete) = self.insert_slot_storage_file(path, filename);


If we are from Dir, we call get_slot_and_append_vec_id above, and then call it again internally within insert_slot_storage_file.

We should just call it once above, regardless of snapshot_from, and pass the slot to insert_slot_storage_file instead of filename

Agreed.
So I now call get_slot_and_append_vec_id once, removed the same call from insert_slot_storage_file, and just pass slot into it.
Then I find that the filename parameter is no longer needed.
Then the parameter list of insert_slot_storage_file is the same as insert_storage_file, and there is almost nothing in insert_slot_storage_file other than calling insert_storage_file.
So, to further simplify it, I removed the insert_slot_storage_file function.

apfitzge · 2023-03-06T20:07:43Z

runtime/src/snapshot_utils/snapshot_storage_rebuilder.rs

-                    &self.next_append_vec_id,
-                    &self.num_collisions,
-                )?;
+                let storage_entry = match &self.snapshot_from {


Comment out of scope for this PR.

We're checking snapshot_from for every entry - I wonder what the perf impact of storing generic fns for this would be, though I suspect minimal since I'd hope branch prediction is doing a good job here.

Ok, I can move "match &self.snapshot_from" out of the entry loop. I tried the following, to define two different closures, and then just make the closure call inside the entry loop. But somehow I got some compilation error.

let process_entry = match &self.snapshot_from { SnapshotFrom::Archive => |path: &Path, current_len: usize, old_appendvec_id: usize| -> Result<Arc<AccountStorageEntry>, std::io::Error> { remap_and_reconstruct_single_storage( slot, old_appendvec_id, current_len, path, &self.next_append_vec_id, &self.num_collisions, ) }, SnapshotFrom::Dir => |path: &Path, current_len: usize, old_appendvec_id: usize| -> Result<Arc<AccountStorageEntry>, std::io::Error> { reconstruct_single_storage(&slot, path, current_len, old_appendvec_id as u32) }, };

The compilation error is

error[E0308]: `match` arms have incompatible types --> runtime/src/snapshot_utils/snapshot_storage_rebuilder.rs:322:17 | 305 | let process_entry = match &self.snapshot_from { | ------------------------- `match` arms have incompatible types 306 | SnapshotFrom::Archive => 307 | |path: &Path, | __________________- | | _________________| | || 308 | || current_len: usize, 309 | || old_appendvec_id: usize| 310 | || -> Result<Arc<AccountStorageEntry>, std::io::Error> { | ||____________________________________________________________________- the expected closure ... | 318 | | ) 319 | | }, | |__________________- this is found to be of type `[closure@runtime/src/snapshot_utils/snapshot_storage_rebuilder.rs:307:17: 310:69]` ... 322 | / |path: &Path, 323 | | current_len: usize, 324 | | old_appendvec_id: usize| 325 | | -> Result<Arc<AccountStorageEntry>, std::io::Error> { 326 | | reconstruct_single_storage(&slot, path, current_len, old_appendvec_id as u32) 327 | | }, | |_________________^ expected closure, found a different closure | = note: expected closure `[closure@runtime/src/snapshot_utils/snapshot_storage_rebuilder.rs:307:17: 310:69]` found closure `[closure@runtime/src/snapshot_utils/snapshot_storage_rebuilder.rs:322:17: 325:68]` = note: no two closures, even if identical, have the same type = help: consider boxing your closure and/or using it as a trait object

I see that two closures have the exactly the same input and return types. But it seems the compiler does not accept it. Should I box it or use a trait?

Tried, somehow boxing doesn't help.

= note: expected struct `Box<[closure@runtime/src/snapshot_utils/snapshot_storage_rebuilder.rs:307:17: 310:69]>` found struct `Box<[closure@runtime/src/snapshot_utils/snapshot_storage_rebuilder.rs:323:17: 326:69]>` = note: no two closures, even if identical, have the same type = help: consider boxing your closure and/or using it as a trait object

Yeah don't worry for now, like I said it's out of scope for this PR. We can come back to this once we have it working, it's just a potential for speed up, not even sure it will help in a meaningful way.

apfitzge · 2023-03-06T21:40:52Z

runtime/src/snapshot_utils.rs

+    let account_paths = account_paths.to_vec();
+    let account_paths_set: HashSet<PathBuf> = HashSet::from_iter(account_paths.clone());
+
+    for dir_symlink in fs::read_dir(accounts_hardlinks).unwrap() {


are these unwraps safe, or should we map err into SnapshotError::Io?

Minimally please do not use unwrap() outside of tests. This applies for the whole impl in this function.

changed to expect

apfitzge · 2023-03-06T21:45:22Z

runtime/src/snapshot_utils.rs

+        let slot_deltas: Vec<BankSlotDelta> = bincode::options()
+            .with_limit(MAX_SNAPSHOT_DATA_FILE_SIZE)
+            .with_fixint_encoding()
+            .allow_trailing_bytes()
+            .deserialize_from(stream)?;


this feels like it's pulled/copied from somewhere else and we should probably wrap the deserialization in a function instead of duplicating the logic

It is copied from the function immediately above -- fn rebuild_bank_from_unarchived_snapshots.

I think in the future we could always rebuild bank from a snapshot dir, so the functions building bank directly from archive could be removed. In that case, they will be no duplication.

sure, but we haven't removed it yet so we do have duplicated code 😄

if we refactor or change slot_delta (de)serialization before remove above function, then we have a higher chance of missing one of these than if we have a common function for doing it.

This is still open/unresolved.

OK, I can put this into a function as a standalone function. Will codedev consider this new function as not tested, and asks for adding a test function? Should I generate a status_cache file to test it?

fn deserialize_status_cache(status_cache_path) { deserialize_snapshot_data_file(&status_cache_path, |stream| { info!( "Rebuilding status cache from {}", status_cache_path.display() ); let slot_deltas: Vec<BankSlotDelta> = bincode::options() .with_limit(MAX_SNAPSHOT_DATA_FILE_SIZE) .with_fixint_encoding() .allow_trailing_bytes() .deserialize_from(stream)?; Ok(slot_deltas) })?

This function just does one thing -- calling another function with the deserializer closure. Should the closure be factor out instead to be a named function?

Added fn deserialize_status_cache

brooksprumo

I didn't finish the review; will pick it up again tomorrow. Here's a few things I saw:

runtime/src/snapshot_utils.rs

xiangzhu70 · 2023-03-07T00:14:12Z

could you add what the time difference is between using snapshot from archive vs from dir?

Sure. Added into the PR description.

apfitzge · 2023-03-07T17:12:56Z

Sure. Added into the PR description.

Awesome thanks! 92s -> 5s is going to be pretty awesome 🚀
Not really part of this PR, but there are other aspects to rebuilding the bank/account system besides reconstructing storage i.e. generating index. Did you happen to grab total startup times for comparison (ignoring replay) using ledger tool?

xiangzhu70 · 2023-03-07T18:22:15Z

Did you happen to grab total startup times for comparison (ignoring replay) using ledger tool?

That part is about 90 sec. This PR does not touch that part of the code, so there is no change. I think we could possibly optimize this by building index at the background, but serving the needed requests first. That's beyond of the scope of this PR.

runtime/src/snapshot_utils.rs

brooksprumo · 2023-03-08T14:34:05Z

runtime/src/snapshot_utils.rs

+        .unwrap();
+
+    for account_path in account_paths {
+        for file in fs::read_dir(account_path).unwrap() {


Please avoid unwrap outside of tests. An expect with a message is an option, or, this function could return a Result and the caller could put a single expect on its invocation.

runtime/src/snapshot_utils.rs

runtime/src/snapshot_utils/snapshot_storage_rebuilder.rs

runtime/src/serde_snapshot.rs

codecov · 2023-03-10T03:11:42Z

Codecov Report

Merging #30171 (5ce205f) into master (2216647) will increase coverage by 0.0%.
The diff coverage is 92.3%.

@@           Coverage Diff            @@
##           master   #30171    +/-   ##
========================================
  Coverage    81.5%    81.5%            
========================================
  Files         723      723            
  Lines      203436   203603   +167     
========================================
+ Hits       165863   166021   +158     
- Misses      37573    37582     +9

runtime/src/snapshot_utils.rs

runtime/src/snapshot_utils/snapshot_storage_rebuilder.rs

apfitzge

Nearly there - just some minor comments about propating errors

apfitzge · 2023-03-21T03:06:18Z

runtime/src/snapshot_utils.rs

+        full_snapshot_untar_us: measure_build_storage.as_us(),
+        incremental_snapshot_untar_us: 0,


Don't do it in this PR - but we should rename this field. We're not untaring anything here. This is just rebuild time.

apfitzge · 2023-03-21T03:17:28Z

runtime/src/snapshot_utils.rs

+    snapshot_info: &BankSnapshotInfo,
+    account_paths: &[PathBuf],
+    next_append_vec_id: Arc<AtomicAppendVecId>,
+) -> Result<AccountStorageMap> {


This introduced several ways to panic, but should be returning a result. Let's catch those expects by returning an Err instead.

removed expect.

missed this. working on it now.

the existing examples all use unwrap for send().

@apfitzge I'm OK with unwrap on send. I don't think we could recover/do anything useful if the channel goes down. Sort of like how we always unwrap locking mutexes. Wdyt?

apfitzge · 2023-03-21T03:20:28Z

runtime/src/snapshot_utils.rs

@@ -2260,8 +2444,23 @@ fn bank_fields_from_snapshots(
    })
 }

+fn deserialize_status_cache(status_cache_path: &Path) -> Result<Vec<BankSlotDelta>> {


brooksprumo

Looking good!

Another scenario I've been wondering about is w.r.t. subsequent incremental snapshot generation after starting up from a snapshot directory. Right now we use the snapshot archives to seed into the background services that allows them to correctly generate incremental snapshots after boot. I think we'll want tests that exercise this usage with the fastboot code.

Specifically something like this:

Run a validator for a bit
Ensure a bank snapshot has been taken
Stop the validator and do a fastboot restart from the snapshot directory
Ensure the background services can generate an incremental snapshot

I think this can happen in another PR; just needs to be in/tested before we enable/allow fastboot for the real validator. And curious, maybe you've already tested this out?

brooksprumo · 2023-03-21T15:47:24Z

runtime/src/snapshot_utils/snapshot_storage_rebuilder.rs

+            if self.snapshot_from == SnapshotFrom::Dir {
+                // Keep track of the highest append_vec_id in the system, so the future append_vecs
+                // can be assigned a unique id.  This is only needed when loading from a snapshot
+                // dir.  When loading from a snapshot archive, the max of the appendvec IDs is
+                // updated in remap_append_vec_file()
+                self.next_append_vec_id
+                    .fetch_max((append_vec_id + 1) as AppendVecId, Ordering::Relaxed);
+            }


Very helpful comment!

I think this impl is fine for now. Maybe in the future we refactor out the fetch_max so we don't do an atomic RMW every iteration of the loop, and instead compute a local max over the whole loop and then do a simple atomic fetch_max at the end. (Not for this PR)

xiangzhu70 · 2023-03-21T18:43:19Z

4. Ensure the background services can generate an incremental snapshot

In the earliest implementation (PR 28745), background service was working fine and generating archives after booting from the snapshot directory (probably it generates a full snapshot archive first).

Will check this again in the next PR in which the arg snapshot_from_dir will be added, and the background service will be running with the initial bank constructed from a snapshot dir.

…xpect, etc

runtime/src/snapshot_utils.rs

brooksprumo

lgtm

xiangzhu70 · 2023-03-22T18:29:59Z

lgtm

Thanks a lot!

xiangzhu70 mentioned this pull request Feb 15, 2023

Add version and state_complete flag into bank snapshot #30099

Merged

xiangzhu70 force-pushed the construct_bank_from_snapshot_dir branch 2 times, most recently from 9996b3d to 833d881 Compare February 16, 2023 17:56

xiangzhu70 mentioned this pull request Feb 16, 2023

Add checks when constructing a BankSnapshotInfo from a directory #30373

Merged

xiangzhu70 force-pushed the construct_bank_from_snapshot_dir branch from 833d881 to e53b6fe Compare February 24, 2023 20:18

xiangzhu70 marked this pull request as ready for review February 25, 2023 00:31

xiangzhu70 requested review from apfitzge and brooksprumo February 25, 2023 00:31

apfitzge reviewed Mar 6, 2023

View reviewed changes

brooksprumo reviewed Mar 6, 2023

View reviewed changes

runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved

runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved

runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved

runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved

brooksprumo self-requested a review March 6, 2023 23:51

brooksprumo reviewed Mar 8, 2023

View reviewed changes

brooksprumo reviewed Mar 9, 2023

View reviewed changes

runtime/src/serde_snapshot.rs Outdated Show resolved Hide resolved

xiangzhu70 force-pushed the construct_bank_from_snapshot_dir branch from db35360 to f4a0fc7 Compare March 10, 2023 01:27

brooksprumo self-requested a review March 10, 2023 13:45

brooksprumo reviewed Mar 13, 2023

View reviewed changes

runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved

xiangzhu70 force-pushed the construct_bank_from_snapshot_dir branch from bf9a873 to 036e311 Compare March 13, 2023 20:49

apfitzge self-requested a review March 15, 2023 23:47

xiangzhu70 force-pushed the construct_bank_from_snapshot_dir branch from d99c12f to 686154e Compare March 17, 2023 00:31

brooksprumo reviewed Mar 17, 2023

View reviewed changes

runtime/src/snapshot_utils/snapshot_storage_rebuilder.rs Outdated Show resolved Hide resolved

xiangzhu70 requested a review from brooksprumo March 17, 2023 22:35

xiangzhu70 force-pushed the construct_bank_from_snapshot_dir branch from 02cc217 to 3df5ea6 Compare March 18, 2023 03:30

apfitzge reviewed Mar 21, 2023

View reviewed changes

brooksprumo reviewed Mar 21, 2023

View reviewed changes

xiangzhu70 added 20 commits March 21, 2023 12:43

Add bank.set_initial_accounts_hash_verification_completed()

056f530

address review issues: appendvec to append_vec, replace unwrap with e…

7357bfe

…xpect, etc

AtomicAppendVecId, remove arc on bank etc

5d348db

slice, CI error on &snapshot_version_path

1e16a56

measure_build_storage

7ea520d

move snapshot_from

f3e3868

remove measure_name from build_storage_from_snapshot_dir

0b7a0a6

remove from_dir specific next_append_vec_id logic

2302236

revert insert_slot_storage_file change

370c129

init next_append_vec_id to fix the substraction underflow

9a0a99c

remove measure from build_storage_from_snapshot_dir

13fc349

make measure name more specific

4607941

refactor status_cache deserialization into a function

e366b02

remove reference to pass the ci check

108d990

track next appendvec id

384e4ff

verify that the next_append_vec_id tracking is correct

3e73ecc

clean up usize

c532a49

in build_storage_from_snapshot_dir remove expect

32e6d0d

test max appendvecc id tracking with multiple banks in the test

252cf25

cleared expect and unwrap in streaming_snapshot_dir_files

a633130

xiangzhu70 force-pushed the construct_bank_from_snapshot_dir branch from f80ddad to a633130 Compare March 21, 2023 19:45

rebase cleanup

67de933

brooksprumo reviewed Mar 22, 2023

View reviewed changes

runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved

runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved

xiangzhu70 added 2 commits March 22, 2023 10:18

change to measure!

4b29e67

dereference arc in the right way

5ce205f

brooksprumo approved these changes Mar 22, 2023

View reviewed changes

xiangzhu70 merged commit d69f602 into solana-labs:master Mar 22, 2023

This was referenced Mar 22, 2023

Add arg boot-from-local-state, enable it in validator #30859

Closed

Add more explict error messages for hardlink operations #30913

Merged

		full_snapshot_untar_us: measure_build_storage.as_us(),
		incremental_snapshot_untar_us: 0,

Construct bank from snapshot dir #30171

Construct bank from snapshot dir #30171

Conversation

xiangzhu70 commented Feb 7, 2023 • edited Loading

Problem

Summary of Changes

xiangzhu70 commented Feb 7, 2023

apfitzge left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xiangzhu70 Mar 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brooksprumo left a comment

Choose a reason for hiding this comment

xiangzhu70 commented Mar 7, 2023

apfitzge commented Mar 7, 2023

xiangzhu70 commented Mar 7, 2023

Choose a reason for hiding this comment

codecov bot commented Mar 10, 2023 • edited Loading

Codecov Report

apfitzge left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brooksprumo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xiangzhu70 commented Mar 21, 2023 • edited Loading

brooksprumo left a comment

Choose a reason for hiding this comment

xiangzhu70 commented Mar 22, 2023

xiangzhu70 commented Feb 7, 2023 •

edited

Loading

xiangzhu70 Mar 7, 2023 •

edited

Loading

codecov bot commented Mar 10, 2023 •

edited

Loading

xiangzhu70 commented Mar 21, 2023 •

edited

Loading