Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Construct bank from snapshot dir #30171

Merged

Conversation

xiangzhu70
Copy link
Contributor

@xiangzhu70 xiangzhu70 commented Feb 7, 2023

Problem

To boot faster, we want to boot from a bank snapshot directory, instead of an archive. This requires getting meta information from the snapshot directories indicating the snapshot versions and the completion statuses, to help identify a good directory to boot from.

Summary of Changes

Add the function bank_from_snapshot_dir() and all the supporting functions and data structures, to construct a bank from a bank snapshot directory.
Add the test function.

This is a split PR of #28745. It is the next step after #30099, which completes the bank snapshot directory.

Testing done:
On mainnet, snapshot untar took 92.4s
When the snapshot directory is used, the archive untar is skipped, so the deserialization is reduced to 5.5s

Fixes #

@xiangzhu70
Copy link
Contributor Author

This will be rebased after PR #30099 is committed, so the reduced changes can be ready for the review.

Copy link
Contributor

@apfitzge apfitzge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Largely happy with this.
I think we had it in the original PR (pre-split) description, but could you add what the time difference is between using snapshot from archive vs from dir?

Comment on lines 752 to 828
assert!(
next_append_vec_id.checked_sub(1).is_some(),
"subtraction underflow"
);
let max_append_vec_id = next_append_vec_id - 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some new case that requires we check this? If so, why not something like this?

Suggested change
assert!(
next_append_vec_id.checked_sub(1).is_some(),
"subtraction underflow"
);
let max_append_vec_id = next_append_vec_id - 1;
let max_append_vec_id = next_append_vec_id.checked_sub(1).unwrap();

I think based on the unwrap panic it should be obvious, no real need (imo) to specify "subtraction underflow".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. updated it.

.unwrap();

for account_path in account_paths {
for file in fs::read_dir(account_path).unwrap() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by the time we're here, we have already verified that these account paths exist, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they went through the step to append "run/", and the run sub directories were created.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid unwrap outside of tests. An expect with a message is an option, or, this function could return a Result and the caller could put a single expect on its invocation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed all unwraps.

let _ = &self
.next_append_vec_id
.fetch_max(next_appendvec_id as u32, Ordering::Relaxed);
}
let (slot, slot_complete) = self.insert_slot_storage_file(path, filename);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are from Dir, we call get_slot_and_append_vec_id above, and then call it again internally within insert_slot_storage_file.

We should just call it once above, regardless of snapshot_from, and pass the slot to insert_slot_storage_file instead of filename

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.
So I now call get_slot_and_append_vec_id once, removed the same call from insert_slot_storage_file, and just pass slot into it.
Then I find that the filename parameter is no longer needed.
Then the parameter list of insert_slot_storage_file is the same as insert_storage_file, and there is almost nothing in insert_slot_storage_file other than calling insert_storage_file.
So, to further simplify it, I removed the insert_slot_storage_file function.

&self.next_append_vec_id,
&self.num_collisions,
)?;
let storage_entry = match &self.snapshot_from {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment out of scope for this PR.

We're checking snapshot_from for every entry - I wonder what the perf impact of storing generic fns for this would be, though I suspect minimal since I'd hope branch prediction is doing a good job here.

Copy link
Contributor Author

@xiangzhu70 xiangzhu70 Mar 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I can move "match &self.snapshot_from" out of the entry loop. I tried the following, to define two different closures, and then just make the closure call inside the entry loop. But somehow I got some compilation error.

        let process_entry = match &self.snapshot_from {
            SnapshotFrom::Archive => 
                |path: &Path,
                 current_len: usize,
                 old_appendvec_id: usize|
                 -> Result<Arc<AccountStorageEntry>, std::io::Error> {
                    remap_and_reconstruct_single_storage(
                        slot,
                        old_appendvec_id,
                        current_len,
                        path,
                        &self.next_append_vec_id,
                        &self.num_collisions,
                    )
                },

            SnapshotFrom::Dir => 
                |path: &Path,
                 current_len: usize,
                 old_appendvec_id: usize|
                -> Result<Arc<AccountStorageEntry>, std::io::Error> {
                    reconstruct_single_storage(&slot, path, current_len, old_appendvec_id as u32)
                },
        };

The compilation error is

error[E0308]: `match` arms have incompatible types
   --> runtime/src/snapshot_utils/snapshot_storage_rebuilder.rs:322:17
    |
305 |            let process_entry = match &self.snapshot_from {
    |                                ------------------------- `match` arms have incompatible types
306 |                SnapshotFrom::Archive => 
307 |                    |path: &Path,
    |  __________________-
    | | _________________|
    | ||
308 | ||                  current_len: usize,
309 | ||                  old_appendvec_id: usize|
310 | ||                  -> Result<Arc<AccountStorageEntry>, std::io::Error> {
    | ||____________________________________________________________________- the expected closure
...   |
318 | |                      )
319 | |                  },
    | |__________________- this is found to be of type `[closure@runtime/src/snapshot_utils/snapshot_storage_rebuilder.rs:307:17: 310:69]`
...
322 |  /                 |path: &Path,
323 |  |                  current_len: usize,
324 |  |                  old_appendvec_id: usize|
325 |  |                 -> Result<Arc<AccountStorageEntry>, std::io::Error> {
326 |  |                     reconstruct_single_storage(&slot, path, current_len, old_appendvec_id as u32)
327 |  |                 },
    |  |_________________^ expected closure, found a different closure
    |
    = note: expected closure `[closure@runtime/src/snapshot_utils/snapshot_storage_rebuilder.rs:307:17: 310:69]`
               found closure `[closure@runtime/src/snapshot_utils/snapshot_storage_rebuilder.rs:322:17: 325:68]`
    = note: no two closures, even if identical, have the same type
    = help: consider boxing your closure and/or using it as a trait object

I see that two closures have the exactly the same input and return types. But it seems the compiler does not accept it. Should I box it or use a trait?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried, somehow boxing doesn't help.

    = note: expected struct `Box<[closure@runtime/src/snapshot_utils/snapshot_storage_rebuilder.rs:307:17: 310:69]>`
               found struct `Box<[closure@runtime/src/snapshot_utils/snapshot_storage_rebuilder.rs:323:17: 326:69]>`
    = note: no two closures, even if identical, have the same type
    = help: consider boxing your closure and/or using it as a trait object

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah don't worry for now, like I said it's out of scope for this PR. We can come back to this once we have it working, it's just a potential for speed up, not even sure it will help in a meaningful way.

let account_paths = account_paths.to_vec();
let account_paths_set: HashSet<PathBuf> = HashSet::from_iter(account_paths.clone());

for dir_symlink in fs::read_dir(accounts_hardlinks).unwrap() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these unwraps safe, or should we map err into SnapshotError::Io?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minimally please do not use unwrap() outside of tests. This applies for the whole impl in this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to expect

Comment on lines 2548 to 2617
let slot_deltas: Vec<BankSlotDelta> = bincode::options()
.with_limit(MAX_SNAPSHOT_DATA_FILE_SIZE)
.with_fixint_encoding()
.allow_trailing_bytes()
.deserialize_from(stream)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels like it's pulled/copied from somewhere else and we should probably wrap the deserialization in a function instead of duplicating the logic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is copied from the function immediately above -- fn rebuild_bank_from_unarchived_snapshots.

I think in the future we could always rebuild bank from a snapshot dir, so the functions building bank directly from archive could be removed. In that case, they will be no duplication.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, but we haven't removed it yet so we do have duplicated code 😄

if we refactor or change slot_delta (de)serialization before remove above function, then we have a higher chance of missing one of these than if we have a common function for doing it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still open/unresolved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I can put this into a function as a standalone function. Will codedev consider this new function as not tested, and asks for adding a test function? Should I generate a status_cache file to test it?

  fn deserialize_status_cache(status_cache_path) {
      deserialize_snapshot_data_file(&status_cache_path, |stream| {
        info!(
            "Rebuilding status cache from {}",
            status_cache_path.display()
        );
        let slot_deltas: Vec<BankSlotDelta> = bincode::options()
            .with_limit(MAX_SNAPSHOT_DATA_FILE_SIZE)
            .with_fixint_encoding()
            .allow_trailing_bytes()
            .deserialize_from(stream)?;
        Ok(slot_deltas)
    })?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function just does one thing -- calling another function with the deserializer closure. Should the closure be factor out instead to be a named function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added fn deserialize_status_cache

Copy link
Contributor

@brooksprumo brooksprumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't finish the review; will pick it up again tomorrow. Here's a few things I saw:

runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
@brooksprumo brooksprumo self-requested a review March 6, 2023 23:51
@xiangzhu70
Copy link
Contributor Author

could you add what the time difference is between using snapshot from archive vs from dir?

Sure. Added into the PR description.

@apfitzge
Copy link
Contributor

apfitzge commented Mar 7, 2023

Sure. Added into the PR description.

Awesome thanks! 92s -> 5s is going to be pretty awesome 🚀
Not really part of this PR, but there are other aspects to rebuilding the bank/account system besides reconstructing storage i.e. generating index. Did you happen to grab total startup times for comparison (ignoring replay) using ledger tool?

@xiangzhu70
Copy link
Contributor Author

Did you happen to grab total startup times for comparison (ignoring replay) using ledger tool?

That part is about 90 sec. This PR does not touch that part of the code, so there is no change. I think we could possibly optimize this by building index at the background, but serving the needed requests first. That's beyond of the scope of this PR.

runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
.unwrap();

for account_path in account_paths {
for file in fs::read_dir(account_path).unwrap() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid unwrap outside of tests. An expect with a message is an option, or, this function could return a Result and the caller could put a single expect on its invocation.

runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
runtime/src/snapshot_utils/snapshot_storage_rebuilder.rs Outdated Show resolved Hide resolved
runtime/src/snapshot_utils/snapshot_storage_rebuilder.rs Outdated Show resolved Hide resolved
runtime/src/snapshot_utils/snapshot_storage_rebuilder.rs Outdated Show resolved Hide resolved
@xiangzhu70 xiangzhu70 force-pushed the construct_bank_from_snapshot_dir branch from db35360 to f4a0fc7 Compare March 10, 2023 01:27
@codecov
Copy link

codecov bot commented Mar 10, 2023

Codecov Report

Merging #30171 (5ce205f) into master (2216647) will increase coverage by 0.0%.
The diff coverage is 92.3%.

@@           Coverage Diff            @@
##           master   #30171    +/-   ##
========================================
  Coverage    81.5%    81.5%            
========================================
  Files         723      723            
  Lines      203436   203603   +167     
========================================
+ Hits       165863   166021   +158     
- Misses      37573    37582     +9     

@brooksprumo brooksprumo self-requested a review March 10, 2023 13:45
@xiangzhu70 xiangzhu70 force-pushed the construct_bank_from_snapshot_dir branch from bf9a873 to 036e311 Compare March 13, 2023 20:49
@apfitzge apfitzge self-requested a review March 15, 2023 23:47
@xiangzhu70 xiangzhu70 force-pushed the construct_bank_from_snapshot_dir branch from d99c12f to 686154e Compare March 17, 2023 00:31
@xiangzhu70 xiangzhu70 force-pushed the construct_bank_from_snapshot_dir branch from 02cc217 to 3df5ea6 Compare March 18, 2023 03:30
Copy link
Contributor

@apfitzge apfitzge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nearly there - just some minor comments about propating errors

Comment on lines +1677 to +1695
full_snapshot_untar_us: measure_build_storage.as_us(),
incremental_snapshot_untar_us: 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't do it in this PR - but we should rename this field. We're not untaring anything here. This is just rebuild time.

snapshot_info: &BankSnapshotInfo,
account_paths: &[PathBuf],
next_append_vec_id: Arc<AtomicAppendVecId>,
) -> Result<AccountStorageMap> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduced several ways to panic, but should be returning a result. Let's catch those expects by returning an Err instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed expect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed this. working on it now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the existing examples all use unwrap for send().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apfitzge I'm OK with unwrap on send. I don't think we could recover/do anything useful if the channel goes down. Sort of like how we always unwrap locking mutexes. Wdyt?

@@ -2260,8 +2444,23 @@ fn bank_fields_from_snapshots(
})
}

fn deserialize_status_cache(status_cache_path: &Path) -> Result<Vec<BankSlotDelta>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@brooksprumo brooksprumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!

Another scenario I've been wondering about is w.r.t. subsequent incremental snapshot generation after starting up from a snapshot directory. Right now we use the snapshot archives to seed into the background services that allows them to correctly generate incremental snapshots after boot. I think we'll want tests that exercise this usage with the fastboot code.

Specifically something like this:

  1. Run a validator for a bit
  2. Ensure a bank snapshot has been taken
  3. Stop the validator and do a fastboot restart from the snapshot directory
  4. Ensure the background services can generate an incremental snapshot

I think this can happen in another PR; just needs to be in/tested before we enable/allow fastboot for the real validator. And curious, maybe you've already tested this out?

Comment on lines 276 to 283
if self.snapshot_from == SnapshotFrom::Dir {
// Keep track of the highest append_vec_id in the system, so the future append_vecs
// can be assigned a unique id. This is only needed when loading from a snapshot
// dir. When loading from a snapshot archive, the max of the appendvec IDs is
// updated in remap_append_vec_file()
self.next_append_vec_id
.fetch_max((append_vec_id + 1) as AppendVecId, Ordering::Relaxed);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Very helpful comment!
  2. I think this impl is fine for now. Maybe in the future we refactor out the fetch_max so we don't do an atomic RMW every iteration of the loop, and instead compute a local max over the whole loop and then do a simple atomic fetch_max at the end. (Not for this PR)

@xiangzhu70
Copy link
Contributor Author

xiangzhu70 commented Mar 21, 2023

4. Ensure the background services can generate an incremental snapshot

In the earliest implementation (PR 28745), background service was working fine and generating archives after booting from the snapshot directory (probably it generates a full snapshot archive first).

Will check this again in the next PR in which the arg snapshot_from_dir will be added, and the background service will be running with the initial bank constructed from a snapshot dir.

@xiangzhu70 xiangzhu70 force-pushed the construct_bank_from_snapshot_dir branch from f80ddad to a633130 Compare March 21, 2023 19:45
runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@brooksprumo brooksprumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@xiangzhu70
Copy link
Contributor Author

lgtm

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants