Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(tool): fix runtime-params-estimator #3601 #3616

Merged
merged 17 commits into from
Dec 8, 2020
Merged

Conversation

ailisp
Copy link
Member

@ailisp ailisp commented Nov 14, 2020

Fix #3601. get new numbers which is very different from existing genesis.json, but reasonable compare to run runtime-param-estimator from master branch

Test Plan

runtime-param-estimator works, no #3598 and #3601. And every delete storage actually delete, not noop.

@ailisp
Copy link
Member Author

ailisp commented Nov 16, 2020

@evgenykuzyakov @nearmax I realized that some tests like storage_remove_10b_key_10b_value_1k and delete_account, warm up txns is questionable because warm up txns could delete too many of them, and cause storage_remove_10b_key_10b_value_1k become many no-ops that underestimate delete cost. Should these metrics always skip warm up iter?

@evgenykuzyakov
Copy link
Collaborator

I realized that some tests like storage_remove_10b_key_10b_value_1k and delete_account, warm up txns is questionable because warm up txns could delete too many of them, and cause storage_remove_10b_key_10b_value_1k become many no-ops that underestimate delete cost. Should these metrics always skip warm up iter?

We need warmup iter to compile a contract to cache. Otherwise we don't need warmup. It should be fine to call noop as a warmup transaction

@ailisp
Copy link
Member Author

ailisp commented Nov 17, 2020

I realized that some tests like storage_remove_10b_key_10b_value_1k and delete_account, warm up txns is questionable because warm up txns could delete too many of them, and cause storage_remove_10b_key_10b_value_1k become many no-ops that underestimate delete cost. Should these metrics always skip warm up iter?

We need warmup iter to compile a contract to cache. Otherwise we don't need warmup. It should be fine to call noop as a warmup transaction

I see. But I mean a different warmup: the one inside measure_transactions function by calling same f() to generate a txn to "warmup". For storage_remove_10b_key_10b_value_1k, the warm up is also storage_remove_10b_key_10b_value_1k:

    if config.warmup_iters_per_block > 0 {
        let bar = ProgressBar::new(warmup_total_transactions(config) as _);
        bar.set_style(ProgressStyle::default_bar().template(
            "[elapsed {elapsed_precise} remaining {eta_precise}] Warm up {bar} {pos:>7}/{len:7} {msg}",
        ));
        for block_size in config.block_sizes.clone() {
            for _ in 0..config.warmup_iters_per_block {
                let block: Vec<_> = (0..block_size).map(|_| (*f)()).collect();
                testbed.process_block(&block, allow_failures);
                bar.inc(block_size as _);
                bar.set_message(format!("Block size: {}", block_size).as_str());
            }
        }
        testbed.process_blocks_until_no_receipts(allow_failures);
        bar.finish();
    }

My concern is that deleting too many key-value pairs in storage_remove_10b_key_10b_value_1k will cause not many accounts have keys to remove, cause this cost underestimate. Also, key-value removed in warm up is random, two runs of param-estimator can have two different case (assume iter_per_block=10, warmup_iters_per_block=10, block_size=100):

  • case 1: warm up remove 1000 keys of 1000 accounts. actual measure_txn randomly choose 1000 accounts that still have keys.
  • case 2: warm up remove 1000 keys of 1000 accounts. actual measure_txn randomly choose 1000 accounts, 500 of them still have keys, 500 of them has deleted keys.
    This two cases can happen and cause a incorrect volatility in measure. Although if number of accounts >> iter_per_block*block_size this volatility is trivial.

@evgenykuzyakov
Copy link
Collaborator

We don't use warmup for the icount because it doesn't need warmup due to precise instruction counting anyway.

@ailisp
Copy link
Member Author

ailisp commented Nov 17, 2020

We don't use warmup for the icount because it doesn't need warmup due to precise instruction counting anyway.

Got it, so warmup_iter is 0, and it's not a problem

@ailisp
Copy link
Member Author

ailisp commented Nov 18, 2020

Fixed some issues and rerun yesterday evening, but still not finished after more than 12 hours, maybe set this size of key value to all active accounts are too expensive 🤔

still not finish now, and runtime-params-estimate ci is failed due to time out, real runtime-param-estimator on instance is running and not stucked, just take much longer time.

Given this is too slow, a work around maybe record what accounts have write_x_key_y_value_1k and measure delete_key on these accounts instead of random.

@ailisp
Copy link
Member Author

ailisp commented Nov 20, 2020

I just examine code again, it's not all active accounts (20K) used in deploy code, but only 300, and randomly selected 2 account, not 100, from 300 to write key, value, so the bug mentioned in this PR still exist and fix in this PR is still valid: adding write all key values to 300 accounts, so randomly selected two account can guarantee to have account.

But, write all key values to these 300 accounts should not take long, which means runtime param estimator might already very slow before this PR because of some reason, I'm confirming this by run master on runtime-param-estimator

Note:

  • 300 can be reasoned, it's from:
    *good_account.borrow_mut() = true;
    *curr_code.borrow_mut() = code_10k.to_vec();

    testbed = measure_transactions(
        Metric::ActionDeploy10K,
        &mut m,
        &config,
        Some(testbed),
        &mut f,
        false,
    );

    // Deploying more small code accounts. It's important that they are the same size to correctly
    // deduct base
    for _ in 0..2 {
        testbed =
            measure_transactions(Metric::warmup, &mut m, &config, Some(testbed), &mut f, false);
    }

    *good_account.borrow_mut() = false;

only when *good_account=true account that has test contract deployed will be added to good_code_accounts, which is then assign to ad as accounts used for measure in for (metric, method_name) in v {. So it's 3 times above, one Metric::ActionDeploy10K, two in for loop. Each time iter_per_block=1, block_sizes=[100], totally 31100=300 good accounts.

  • before measuring host functions, block size is set to 2:
    config.block_sizes = vec![2];

    // When adding new functions do not forget to rebuild the test contract by running `test-contract/build.sh`.

This cause we only randomly choose two account to do all host function measures, which I assume it's the designed behavior.

@ailisp
Copy link
Member Author

ailisp commented Nov 20, 2020

master still runs in around 6 hours, so the slowdown is indeed because of 300 write_x_key_y_value_1k, so i make a change to only write_x_key_y_value_1k to account that going to be measure read or remove, instead of do so on 300 accounts, hopefully this will bring execution time back to 6 hours.

@evgenykuzyakov
Copy link
Collaborator

Can you also take a look why storage_write_evicted_byte is lower than storage_read_value_byte and why it has decreased?

@ailisp
Copy link
Member Author

ailisp commented Nov 24, 2020

Can you also take a look why storage_write_evicted_byte is lower than storage_read_value_byte and why it has decreased?

OK. The numbers above is depreciated as i changed to only set key value to the account that is going to read and delete key,value lazily, will look if new number is still this case.

@ailisp ailisp changed the title WIP fix(tool): fix runtime-params-estimator #3601 fix(tool): fix runtime-params-estimator #3601 Nov 24, 2020
@ailisp
Copy link
Member Author

ailisp commented Dec 5, 2020

@evgenykuzyakov addressed all comments, ptal

@ailisp
Copy link
Member Author

ailisp commented Dec 8, 2020

@willemneal @olonho please review

Copy link
Contributor

@olonho olonho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@olonho
Copy link
Contributor

olonho commented Dec 8, 2020

Note that test fails with

[2020-12-07T17:49:15Z] thread 'runtime::test::test_delete_account_after_unstake' panicked at 'Failed to open the database: DBError(Error { message: "IO error: While open directory: /tmp/test_validator_delete_accountrM135v/data: Too many open files" })', core/store/src/lib.rs:292:23```

@ailisp
Copy link
Member Author

ailisp commented Dec 8, 2020

[2020-12-07T17:49:15Z] thread 'runtime::test::test_delete_account_after_unstake' panicked at 'Failed to open the database: DBError(Error { message: "IO error: While open directory: /tmp/test_validator_delete_accountrM135v/data: Too many open files" })', core/store/src/lib.rs:292:23```

This happen quite often after we switch to use plain cargo test workspace in ci. I asked SRE team to take a look. (For sure unrelated to this PR because all change is in runtime-param-estimator :)

@near-bulldozer near-bulldozer bot merged commit 4a9ecae into master Dec 8, 2020
@near-bulldozer near-bulldozer bot deleted the fix-3601 branch December 8, 2020 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: Param estimator reuses accounts that rely on the storage
5 participants