-
Notifications
You must be signed in to change notification settings - Fork 622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Easy and understandable fees estimation #4633
Comments
Thank you for taking a high-level look at it! I would add another motivation on top of it:
|
Ok, I think I have proposed implementation strategy for the estimator. We have a CostTable which lists every possible cost. For each cost, we have a rust function that computes the cost. Each function is stateless, and we don't try to use a single unified framework for all costs. If a particular costs needs to be computed in an odd way, so be it. That said, there are groups of costs that are computed in a similar way. For them, we'll just have helper functions to be called from the view-computing function. Similarly, if several cost functions need similar setups, the steup is extracted into a helper, and is called from each cost function: // Do
fn cost1() {
common_setup()
compute_cost1()
}
fn cost2() {
common_setup()
compute_cost2()
}
// Don't
fn costs() {
common_setup()
compute_cost1()
compute_cost2()
} This is a bit more verbose, but makes dependencies explicit, and allows to reason about costs in isolation. If we find that re-running some common setup is too slow, we'll add an explicit cache for this bit of state. As a special case, if cost2 and cost3 both depend on cost1, then we'll add explicit caching code to cost1 computation. As to the way we actually compute costs, the current approach of manufacturing the block, computing "average" time, and maybe subtracting some base cost seems broadly reasonable to me for most costs. I can also see that for some costs we might want to use a more fine-grained approach and measure something specific, rather than the whole block. The suggested setup should allow us to transparently support both types of costs. I don't see a bullet proof solution to "performance depends on the exact setup" problem. I think the proposal makes it easier by making setup an explicit per-cost setup, such that we at least see what we think is important for each particular cost. What would also help is reducing the configuration space of the param estimator to just two flags Here's roughly how I expect the interesting parts to look like: struct GasCost(u32);
static ALL_COSTS: &[(Cost, fn(&mut Ctx) -> GasCost)] = &[
(Cost::ActionReceiptCreation, action_receipt_creation),
(Cost::ActionCreateAccount, action_create_account),
];
pub fn run(config: Config) -> CostTable {
let mut ctx = Ctx::new(config);
let mut res = CostTable::default();
for (cost, f) in all_costs {
let value = f(&mut ctx);
res.add(cost, value)
}
res
}
fn action_receipt_creation(ctx: &mut Ctx) -> GasCost {
if let Some(gas_cost) = ctx.cached.action_receipt_creation {
return gas_cost;
}
let res = ctx.measure_transactions(|ctx| from_actions(vec![]));
ctx.cached.action_receipt_creation = Some(res);
res
}
fn action_create_account(ctx: &mut Ctx) -> GasCost {
let total_cost = ctx.measure_transactions(|ctx| {
let account_id = ctx.random_existing_account();
let other_account_id = ctx.random_non_existing_account();
SignedTransaction::from_actions(
ctx.nonce(&account_id),
account_id,
other_account_id,
&InMemorySigner::from_seed(account_id.clone(), KeyType::ED25519, account_id.as_ref()),
vec![
Action::CreateAccount(CreateAccountAction {}),
Action::Transfer(TransferAction { deposit: 10u128.pow(26) }),
],
CryptoHash::default(),
)
});
let base_cost = action_receipt_creation(ctx);
total_cost - base_cost
} |
This approach sounds very much desirable, thanks Alexey! |
You can now pass something like --metrics-to-measure Receipt,data_receipt_10b_1000,data_receipt_base_10b_1000 to calculate only those metrics. Note that metrics != costs. The end goal is to do this for actual costs, as outlined in near#4633 (comment) The current approach is an intermediate step in that direction. In particular, it allows to debug estimator refactors much faster. Test Plan --------- Manually run estimator to smoke test that this works, use a tool ([SSR](https://rust-analyzer.github.io/manual.html#structural-search-and-replace)) to change extraction logic without introducing typos.
You can now pass something like --metrics-to-measure Receipt,data_receipt_10b_1000,data_receipt_base_10b_1000 to calculate only those metrics. Note that metrics != costs. The end goal is to do this for actual costs, as outlined in near#4633 (comment) The current approach is an intermediate step in that direction. In particular, it allows to debug estimator refactors much faster. Test Plan --------- Manually run estimator to smoke test that this works, use a tool ([SSR](https://rust-analyzer.github.io/manual.html#structural-search-and-replace)) to change extraction logic without introducing typos.
is a first cut at it #4733, very incomplete atm |
We might want to avoid any kind of caching because it creates complex logic flows and make diagnostics harder. If someone runs a fee estimation for one operation then caching will not save time, but if someone runs fee estimation for multiple operations then they can run them in parallel and the execution time will be defined by whichever operation was the one to create the cache, which again does not shorten the total time much (unless we argue that operations take very different time to estimate and the shortest one creates the cache). It only saves CPU cycles which we can be wasteful about, since we run param estimator in frequently. for (cost, f) in all_costs { This part can be behind rayon. |
Oh wow, the idea of adding parallelism didn't enter my mind at all, thanks @nearmax for mentioning it! Yeah, I do think, long term, we'll be able to leverage parallelism here, cost computation is embarassingly parallel. I don't think that would be as easy as rayon though -- we do process-wide counting in qemu, and scoping that to a single thread might be fiddly. So it seems that we'd rather go for multiprocess setup, where we spawn several qemu in docker, where each process computes a subset of costs. Both per-process qemu and multiprocessing seems like some amount of work, so for the time being I concentrate on the single threaded approach.
I think I see where you are coming from, but I do think the caching makes sense here. The problem is that caching can be hard, if you implement it in a generic way, with high-order functions and macros. In my wip PR I instead implemented cache in the dumbest way possible, as an extra first-order if. I think this shouldn't create any problems with understanding and debugging. I think caching helps in two ways:
Either way, the caching is implemented in concatenable way (there's no caching framework/high-order-function, each cachable cost has 2 extra statements for caching), it should be trivial to add it or back it out. |
You can now pass something like --metrics-to-measure Receipt,data_receipt_10b_1000,data_receipt_base_10b_1000 to calculate only those metrics. Note that metrics != costs. The end goal is to do this for actual costs, as outlined in #4633 (comment) The current approach is an intermediate step in that direction. In particular, it allows to debug estimator refactors much faster. Test Plan --------- Manually run estimator to smoke test that this works, use a tool ([SSR](https://rust-analyzer.github.io/manual.html#structural-search-and-replace)) to change extraction logic without introducing typos.
Some in-progress findings:
|
This issue has been automatically marked as stale because it has not had recent activity in the last 2 months. |
Yeah, @jakmeier is working on this general area, though I'd say this particular issue is fixed, and we should open more specific sub-issues for what's left to be done ...
... with the terminal goal of ensuring that our fees are sound (non-exploitable) and allow for reliable implementation of runtime @jakmeier I think we two should just go and create a bunch of follow-up issues here. I might not get to this today myself, so feel free to go ahead (see A-param-estimator label). |
I've written down all that I could think of as follow-ups around the topic of estimator confidence:
This covers at least some of what you listed. @matklad I think most pressing are probably the last two, understanding storage better and getting a rough overview of undercharging right now. So I guess that's what I will work on for now. |
Zulip topic: https://near.zulipchat.com/#narrow/stream/295306-dev-contract-runtime/topic/Design.20for.20parameter.20estimator
Problem: any task that requires adjustment, addition, or investigation of fees today requires a lot of engineering effort. One does not simply "run params estimator".
For the past couple of weeks, I've look at how current estimator works, and I identified the following set of issues:
That said, the overall approach to estimation feels broadly correct -- to estimate a fee, we construct a transaction which exercises worst-case for this fee, and observer the time it takes whole runtime to process the op.
Proposed plan of action (somewhat incomplete at this point and to be adjusted as we go):
param-estimator
estimate subset ofRuntimeConfig
fees #4501cargo run -p params-estimator
does the docker and quemu stuff automatically.The text was updated successfully, but these errors were encountered: