Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise memory usage while loading genesis #4269

Merged
merged 30 commits into from
May 7, 2021

Conversation

posvyatokum
Copy link
Member

  1. Adding option of streaming records to Genesis
  2. Applying genesis records in chunks (not only in case of streaming)

@posvyatokum
Copy link
Member Author

Probably should adjust the chunk size. With this patch, the node is starting in 15 minutes, without it -- in 8. As for memory, the new code consumes 3 GB (old code -- 14 GB), so there is room to go.

@posvyatokum posvyatokum linked an issue May 2, 2021 that may be closed by this pull request
Copy link
Contributor

@matklad matklad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some Rust-related comments (although most of them are "oh, that's actually harder than I thought, I see now why you do this the way you did").

One thing I am worrying a lot is that now processing records might fail with an io::Error and we do not reflect that in the types. We just unwrap inside.I feel that's a significant change of geneissis contract, and that's its better to expose the fact we can fail at the call site. But I see how this will make this already quite a big change larger. I don't necessary want to block on this, but it does seems that silencing IO errors that way increases the technical debt substantionaly.

core/chain-configs/src/genesis_config.rs Outdated Show resolved Hide resolved
core/chain-configs/src/genesis_config.rs Outdated Show resolved Hide resolved
core/chain-configs/src/genesis_config.rs Outdated Show resolved Hide resolved
core/chain-configs/src/genesis_config.rs Outdated Show resolved Hide resolved
core/chain-configs/src/genesis_config.rs Outdated Show resolved Hide resolved
runtime/runtime/src/lib.rs Outdated Show resolved Hide resolved
runtime/runtime/src/lib.rs Outdated Show resolved Hide resolved
Comment on lines 1551 to 1553
let mut genesis_state_applier =
GenesisStateApplier::new(tries, shard_id, validators, config);
genesis_state_applier.apply(genesis, shard_account_ids)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, at least it seems that this dosen't need to be a stateful object, as we apply all the stuff in one go anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I've done what you wanted, it still looks weird. Also this part of code moved to runtime/runtime/src/genesis.rs

core/chain-configs/src/genesis_config.rs Outdated Show resolved Hide resolved
core/chain-configs/src/genesis_config.rs Outdated Show resolved Hide resolved
core/chain-configs/src/genesis_config.rs Outdated Show resolved Hide resolved
runtime/runtime/src/lib.rs Outdated Show resolved Hide resolved
runtime/runtime/src/lib.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@bowenwang1996 bowenwang1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the peak memory usage when neard loads testnet genesis after this change?

genesis.config.total_supply = get_initial_supply(&genesis.records.as_ref());
genesis
}

pub fn new_as_is(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the best name :)

@posvyatokum
Copy link
Member Author

What is the peak memory usage when neard loads testnet genesis after this change?

5 GB

}
}

fn stream_records_from_file(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is logically static method of RecordsProcessor

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not really important): I'd say it's the other way around -- this function is the public API, and RecordsProcessor is it's private implementation detail. If it were a separate module (it doesn't have to be), it's best written as

// genesis_streaming.rs
pub fn stream_records_from_file(
    reader: impl Read,
    mut callback: impl FnMut(StateRecord),
) -> serde_json::Result<()> {
    let mut deserializer = serde_json::Deserializer::from_reader(reader);
    let records_processor = RecordsProcessor { sink: &mut callback };
    deserializer.deserialize_any(records_processor)
}

struct RecordsProcessor<F> { ... }

Copy link
Contributor

@matklad matklad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left couple of comments, nothing blocking!

fn stream_records_with_callback(
&self,
callback: impl FnMut(StateRecord),
) -> std::io::Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's idiomatic to use std::io; and use io::Result and io::Error

Comment on lines 429 to 430

pub fn for_each_record(&self, mut callback: impl FnMut(&StateRecord)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I am still not feeling completely comfortable with just .expecting errors here. This signature looks as if it's a pure in-memory function that can't fail. I suggest adding a comment here explaining that this actually does disk IO and can panic if the genesis file is removed.

On the over hand, we already have validate_genesis which handles errors via panics, so in that sense this doesn't make the code worse...

}
}

fn stream_records_from_file(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not really important): I'd say it's the other way around -- this function is the public API, and RecordsProcessor is it's private implementation detail. If it were a separate module (it doesn't have to be), it's best written as

// genesis_streaming.rs
pub fn stream_records_from_file(
    reader: impl Read,
    mut callback: impl FnMut(StateRecord),
) -> serde_json::Result<()> {
    let mut deserializer = serde_json::Deserializer::from_reader(reader);
    let records_processor = RecordsProcessor { sink: &mut callback };
    deserializer.deserialize_any(records_processor)
}

struct RecordsProcessor<F> { ... }

@@ -299,11 +375,16 @@ impl GenesisJsonHasher {

impl Genesis {
pub fn new(config: GenesisConfig, records: GenesisRecords) -> Self {
let mut genesis = Self { config, records, phantom: PhantomData };
let mut genesis =
Self { config, records, records_file: PathBuf::new(), phantom: PhantomData };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder it this should be records_file: None? That is, use Option to relfect in the type system that we might not have records_file at all.

Comment on lines 152 to 160
match &state_record {
StateRecord::Account { account_id, .. }
| StateRecord::AccessKey { account_id, .. }
| StateRecord::Contract { account_id, .. }
| StateRecord::ReceivedData { account_id, .. }
| StateRecord::Data { account_id, .. } => account_id,
StateRecord::PostponedReceipt(receipt) | StateRecord::DelayedReceipt(receipt) => {
&receipt.receiver_id
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idiom!

@@ -147,3 +147,16 @@ fn to_printable(blob: &[u8]) -> String {
}
}
}

pub fn state_record_to_account_id(state_record: &StateRecord) -> &AccountId {
match &state_record {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
match &state_record {
match state_record {

state_record is already a ref

Comment on lines +17 to +28
pub fn new(genesis_config: &'a GenesisConfig) -> Self {
return Self {
genesis_config,
total_supply: 0,
staked_accounts: HashMap::new(),
account_ids: HashSet::new(),
access_key_account_ids: HashSet::new(),
contract_account_ids: HashSet::new(),
};
}

pub fn process_record(&mut self, record: &StateRecord) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub fn new(genesis_config: &'a GenesisConfig) -> Self {
return Self {
genesis_config,
total_supply: 0,
staked_accounts: HashMap::new(),
account_ids: HashSet::new(),
access_key_account_ids: HashSet::new(),
contract_account_ids: HashSet::new(),
};
}
pub fn process_record(&mut self, record: &StateRecord) {
fn new(genesis_config: &'a GenesisConfig) -> Self {
return Self {
genesis_config,
total_supply: 0,
staked_accounts: HashMap::new(),
account_ids: HashSet::new(),
access_key_account_ids: HashSet::new(),
contract_account_ids: HashSet::new(),
};
}
fn process_record(&mut self, record: &StateRecord) {

pub is not needed here. I also find it useful to not have unnecessary pubs. That way, when you see a pub, you know that its a cross-crate interface.

Comment on lines +56 to +58
.validators
.clone()
.into_iter()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.validators
.clone()
.into_iter()
.validators
.iter()
.cloned()

This will be a tiiiny bit better, as the vector itself won't be cloned

}

/// Validate genesis config and records. Panics if genesis is ill-formed.
pub fn validate_genesis(genesis: &Genesis) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have any existing conventions around this, but I personally prefer relatively strongly to keep module's public API at the top. That way, when you read a module, you can understand at a glance it's API: https://github.com/rust-analyzer/rust-analyzer/blob/master/docs/dev/style.md#order-of-items

@@ -2251,6 +2122,8 @@ mod tests {

#[test]
fn test_delete_key_add_key() {
use near_primitives::account::AccessKey;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is alerady inside the cfg(test) module, so you can put the import at the top of the module:

#[cfg(test)]
mod tests {
    use super::*;
    use near_crypto::{InMemorySigner, KeyType, Signer};
    use near_primitives::errors::ReceiptValidationError;
    use near_primitives::hash::hash;
    use near_primitives::profile::ProfileData;
    use near_primitives::test_utils::{account_new, MockEpochInfoProvider};
    use near_primitives::transaction::{
        AddKeyAction, DeleteKeyAction, FunctionCallAction, TransferAction,
    };
    use near_primitives::types::MerkleHash;
    use near_primitives::version::PROTOCOL_VERSION;
    use near_primitives::account::AccessKey; // new import
    use near_store::test_utils::create_tries;
    use near_store::StoreCompiledContractCache;
    use std::sync::Arc;
    use testlib::runtime_utils::{alice_account, bob_account};

@posvyatokum posvyatokum merged commit 884c6d1 into master May 7, 2021
@posvyatokum posvyatokum deleted the optimize-genesis-loading branch May 7, 2021 18:47
@frol
Copy link
Collaborator

frol commented Aug 2, 2021

@posvyatokum

Probably should adjust the chunk size. With this patch, the node is starting in 15 minutes, without it -- in 8. As for memory, the new code consumes 3 GB (old code -- 14 GB), so there is room to go.

I just started a fresh testnet node based on 1.20.0-rc.2 release (which includes this PR, I believe), and observed that it consumed 10GB of RAM on neard init. Is that expected?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize genesis loading speed and memory usage
5 participants