-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lower storage compute costs #12044
Lower storage compute costs #12044
Conversation
Some(read) => { | ||
let read_len = read.len() as u64; | ||
// TODO BEFORE UNDRAFTING: REPLACE WITH THE RIGHT CONSTANT USAGE | ||
if read_len < 4096 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pugachAG Do you know where the 4k constant I've been told is the limit to load for memtrie lives? I've tried digging around core/store/src/trie/mem
but haven't been able to find the place in loading the trie that would limit to 4k, and thus to reuse the constant from there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as far as I know inlined values are taken from the flat storage when loading the memtrie, so the constant is INLINE_DISK_VALUE_THRESHOLD
3bb82d8
to
018ef57
Compare
I'm setting small read base to 159G on this PR. This is 9G of small read base, plus 6G of TTN multiplied by 25 of overestimated trie depth. I think we should be fine even if trie depth increases to more than 25, because 6G of TTN is probably overestimated due to it being a WriteTTN. We should be able to reduce these costs to much lower on average (because not all trie branches are that deep), once we manage to implement ReadTTN. |
f1a1dc4
to
c40e889
Compare
This should be ready for review. All tests passed locally, hopefully they will pass on CI too :) |
c40e889
to
98eaea4
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #12044 +/- ##
==========================================
+ Coverage 71.40% 71.56% +0.15%
==========================================
Files 814 815 +1
Lines 163936 164304 +368
Branches 163936 164304 +368
==========================================
+ Hits 117055 117578 +523
+ Misses 41754 41582 -172
- Partials 5127 5144 +17
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
main_storage_proof_size_soft_limit: {old: 3_000_000, new: 4_000_000} | ||
wasm_storage_has_key_base: { old: { gas: 54_039_896_625, compute: 200_000_000_000 }, new: { gas: 54_039_896_625, compute: 8_000_000_000 } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be the same as wasm_storage_small_read_base
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thank you! This is now fixed :)
(For people who were not in our DMs, I forgot to include the ReadTTN overestimated costs in has_key_base
, this is now fixed by making it 158Ggas)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please elaborate on motivation for introducing 3 additional costs for reading small values?
As far as I see it we only need to add one compute-only cost to account for overhead of reading large value ( > 4kb) from the disk.
main_storage_proof_size_soft_limit: {old: 3_000_000, new: 4_000_000} | ||
wasm_storage_has_key_base: { old: { gas: 54_039_896_625, compute: 200_000_000_000 }, new: { gas: 54_039_896_625, compute: 8_000_000_000 } } | ||
wasm_storage_has_key_byte: { old: 30_790_845, new: { gas: 30_790_845, compute: 9_000_000 } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as far as I understand this should be the same as wasm_storage_small_read_key_byte
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I just checked the raw data from the estimator, and it seems reasonable enough to set both at 10Mgas, there's a small performance difference but not that noticeable. Adjusting the numbers :)
@Ekleog-NEAR do you have an example of reduction in compute cost? For example, what is the new compute cost vs. old compute cost in typical workload such as fungible token transfers? |
Firstly, I think we do need at least large/small distinction at least for base and value_byte costs. Because both the base latency and the throughput are different between on-disk and in-memory trie. Also, you mentioned compute-only. While we could do it that way, it would make it harder to eventually, maybe, reduce gas costs of memtrie reads later in the future. As for small_read_key_byte, I agree with you that we could probably do without it. However, I do think that it makes sense to add it, for consistency: if we had two small_read costs but not the third one, it would likely be surprising to future readers of the code. Considering having one additional cost is not a lot of lines of code, I feel like having it reduces tech debt by making the costs more straightforward. IOW, I think it's better to semantically have two sets of three costs, one "small reads" and one "large reads"; than it'd be to have "read base, read key byte, read value byte, read overcost for large value base, read overcost for large value per value byte" costs. The drawback is, as you could see, that there's a bit of juggling with the Does that make sense, or do you think the other alternative would be more semantically useful for end-users? |
I just pushed an update that removes the Hopefully, with this change this is now ready to land :) |
cb67708
to
677166a
Compare
677166a
to
4e774e5
Compare
The tests all pass, and the PR seems ready for re-review to me :) |
@bowenwang1996 To answer your question, storage costs reduce on real-world use cases by 50%. The storage gas costs are themselves around 25-50% of the total gas costs, but they are also the only ones with higher compute costs. Overall, taking everything into account, on a few randomly-chosen receipts (4 HOT receipts and 3 sweat receipts), I'm seeing a pretty stable ~30-40% improvement. This, in turn, means that the chain throughput will increase by roughly 50% after this change reaches mainnet. The detailed data is available here. (Note: the above analysis assumes that non-function-call receipts amount to a negligible volume of compute costs) Considering these pretty high numbers, it's particularly important that we run this on a mocknet with transaction relayer before pushing in production, but AFAICT we always do that on every release anyway, so we should be fine. |
I tried running the before/after of this change on a full forknet setup, that would include state witness size limit and not just compute costs limits as the above analysis. The first thing to note is, that our current forknet infrastructure seems to be unable to saturate the chain. It's apparently pushing transactions too slowly, and thus failing to saturate the chain even when asked for the maximum speed of pushing. So these numbers should be understood as average numbers, more than as improvements under serious load. I would expect improvements under serious load to be significantly higher, because as soon as the blocks are not full, compute costs changes will have no impact. All this forewarning being said, here are the numbers I found. The full runs are available: I did three runs of forknet:
Gas usage per shard seems to increase by ~10% with the patch. This would mean a ~10% increased throughput. However, this seems like a random fluctuation to me, because delayed receipts mostly stay at zero and transactions processing never is limited in both cases. We can see here why I think that the forknet system fails to throttle the system. Especially with 10x the mainnet speed, I would expect to see many more blocks limited by either compute or state witness than with 2x the mainnet speed. But the numbers are roughly the same, which seems to indicate to me that the transaction replayer can actually replay only at ~2x mainnet speed (or even less). There is also not much difference here between before and after the change, which is consistent with the chain not actually being under pressure. Finally, the delayed receipts might be the most interesting of the graphs, but also the hardest one to interpret. Here we can see:
To be honest, I don't really know how to interpret that, with the knowledge that the only thing that changed was compute costs. It might all be spurious changes due to random variations of the speed at which the traffic generator node pushed transactions. However, I will say that all of these graphs are essentially indicative of a mostly healthy chain, as delayed receipts were very limited in numbers and transactions processing never limited. SummaryAltogether, running with using forknet as a benchmarking system did not actually answer the question of how much performance will change after this code reaches mainnet. The theoretical computation above would probably stay the best numbers we can have for now, because forknet is unable to actually push the chain enough to hit the compute costs limits. Along with my experience using forknet to benchmark, I think that we need to:
As written in my last weekly sync message, I will start working on this from now on, so that we can get good numbers close to real-world scenarios. |
See #11925 for all the gory details.
The major points are:
read_base
andhas_key_base
by 150G. This is to handle the fact that we don't actually have ReadTTN yet. Once we have ReadTTN, we can reduce these numbers to match estimator outputs again. But this will have to wait for a solution for RPC and archival nodes, that currently live on flat storage. The solution might be to enforce memtrie for them too, but it'd increase operational expenses.write_base
, and RocksDB batches writes, so WriteTTN can be reasonably close to estimator results nowadays, even though we still have to take some variance in disk write throughput into account.ReadCachedTrieNode
was not reduced due to concerns about how it is actually implemented with memtrie.1
. This is because our current way of accounting for compute costs is to take the gas costs, divide by the gas cost, and multiply by the compute cost. So our current code does not support a gas cost of 0 with a non-zero compute costs, and changing that would require refactoring.cc @Ekleog-NEAR