-
Notifications
You must be signed in to change notification settings - Fork 632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gas costs post-memtrie #11925
Comments
Notes from today's meeting with @tayfunelmas and @akhi3030:
|
I ended up not being able to run on n2-standard-8 due to some illegal instructions, most likely the newer sha256 instructions due to n2 being too old a cpu architecture. Switched to the amd-based n2d-standard-8, and got a few runs out. Command used, always with 1 warmup iter but with a number of actual iterations that varies:
1 iter
2 iters
3 iters
4 iters
5 iters
10 iters
20 iters
30 iters
40 iters
50 iters
75 iters
100 iters
150 iters
|
Data out of the runs (without memtrie enabled yet)Skill unlocked: using chatgpt to turn unstructured data into structured data:
This gives the following graphs: In particular, it shows that while the results are mostly stable over a different number of iterations, the remaining instability seems to not be particularly lower when running more iterations. So the sweet spot will likely be running like ten 10-iters runs and doing data magic on them to decide what's good for us. There is a slight ramp-up when increasing the iteration number, probably due to variance being better-detected and thus pushing the numbers up. This is likely not very relevant, as we should take a safety margin of more than this number. So currently, our costs are set quite a bit higher than estimator results. If I understand Jakob's analysis correctly, this was done in order to derisk the flat storage launch: we wanted to be sure that our costs were not underestimated in the worst case. AFAIU we're changing that policy nowadays to say that costs that just avoid missed chunks with our real-world load should be enough. Still, considering we're looking into various ways of getting state writes out of the critical path altogether, I'd suggest waiting until that happens to pursue gas costs reductions in writes: once flushing to disk is no longer a risk, we can safely ignore all the computations Jakob did in #8006 and just set values to the estimator-provided values. Next stepsThis project started around reevaluating the read costs with memtrie, so we should focus on that. But then we should likely continue with reducing the write costs too. My planned next steps are:
|
Here is what the numbers of the parameter estimator out of 12adab4 (with the log line removed) look like: (The full data for the runs used for these graphs is available here) So memtrie does bring in very significant benefits on estimator results! Though it also seems to make writing a bit slower at times. Thanksfully we already have a very large safety margin on writes, so we should be able to just ignore the increase in costs here. And the most significant benefit of memtrie is that it removes all disk access from the code path, meaning that the variance should be much lower, and thus that we can take a much smaller safety margin compared to the previous 15G -> 200G safety margin that we needed to account for disk limits under load. With all that in mind, I suggest targeting a roughly 20~30% safety margin above estimator 90th percentile, because memtrie should have a pretty low variance:
My next step will be making the protocol version change that does this split. It'll be my first time adding a cost so it might take a bit longer than initially expected. @Longarithm / @tayfunelmas can you check in particular the assumptions in three places that have the Also, what do you think about these numbers? They seem to be reasonable to me, but are they to you too? Next stepsConsidering the recent discussion, I'm removing 4 and 5 from the next steps listed above. The next steps thus are:
|
also CC: @pugachAG |
this is very good work. I am happy to see that we are moving forward with these cost reductions. Excited for the next steps. |
@Ekleog-NEAR we also need to consider how storage operations contribute to the chunk state witness size. My intuition says that with memtrie that would mostly determine gas costs, not the latency. |
@pugachAG: not sure I understand your comment. Gas costs are related to chunk apply latency. If we reduce gas costs but the latency remains high, then we will have undercharging. |
Just finished a call with @pugachAG to understand their concern above better. When we access storage, there are two types of costs that we need to consider.
The concern is that if we reduce gas costs based purely on 1. above, then we simply get limited by the state witness size limit. Hence, we should consider both the limits above to decide what the new gas values should be. I agree with this. I don't think it makes sense to reduce gas costs a lot when we will not see significant benefit of that in practice if we keep getting limited by the state witness size. @Ekleog-NEAR: let's discuss in more detail when you are back from your PTO. I imagine that we should run a few more experiments to figure out how low we can make the gas costs before further reductions will not make sense. Another point that we discussed in the call is that the current TTN for reading cost is too low. It was set incorrectly when we shipped flat state. So we should actually take this opportunity to see what the correct value should be and if check if it needs to be increased even. |
@Ekleog-NEAR: I had a conversation with Bowen about this work. A question came up that it seems like we should expect memtrie to help with writes and not with reads. Reads were already optimised due to flat storage. Finally, there are some other concerns with reducing the gas costs. We should look at what the current compte costs are and instead consider reducing them instead. |
@Ekleog-NEAR when you run the experiments, did you include 0c23749 ? According to @pugachAG this significantly improves memtrie performance. |
I don't think I agree with this concern. Compute costs and state witness size are two orthogonal limitations, and it just happens that storage reads consume the two. But we have a lot of other operations that consume only compute costs. So for example, imagine that we have an original workload that does 1 storage read for 1 hash computation. There, it's useless to reduce compute costs by a lot, because we'll very quickly be limited by state witness size. But then a new workload comes in, that does 1 storage read followed by 1M hash computations. If we do not reduce the compute costs, then this workload will be heavily limited by compute costs. As we don't know what future workloads will be coming on our blockchain, and there's no value in keeping compute costs higher than they should be, I do think that we should reduce them to the lowest value we feel comfortable handling — and increasing them again is not hard anyway.
@pugachAG My understanding was that TTN would only be dealing with values of less than 4k, so would literally never touch the disk trie. Is this understanding wrong? (I also saw that TTN was set to too low a value when deploying flat storage, but if this understanding is correct then we no longer need to account for the variance there was in flat storage; and thus we can set it to almost the estimator-provided value. Said variance was the reason why TTN costs could at times have been underestimated)
I probably don't understand this statement, because it seems incompatible with the discussion we had had with @tayfunelmas, summarized above. Storage reads were already a bit better with flat storage, but memtrie allows us to bypass any disk read, which is a lot of improvement even compared to the only-one-disk-read of flat storage. And on the other hand storage writes still require us to write to all of memtrie, flat storage and on-disk trie, which means that writes are actually a bit slower with memtrie than without. This understanding is confirmed by the numbers I posted above, so I probably don't understand this sentence.
The project that was given to me was to reduce gas costs. But as per the first message, I only looked at reducing compute costs for now, and all the "from" numbers given are for compute costs. This being said, the numbers I suggested changing to are not only lower than our current compute costs, but also lower than our current gas costs. So after this lands and we confirm that it doesn't break anything (ie. after we're sure we won't need to revert the changes in a hurry), we will be able to consider also lowering the gas costs to the new value of the compute costs. |
I did not; IIRC it was still marked as draft when I ran the experiments. I'll run a new set of experiments and see how things evolve with it. |
Here are the results of the new set of experiments, run on fc87c85 so including the changes of 0c23749: I'm actually not finding much difference between the results with 0c23749 and the results without it: the estimator results are almost always within one standard deviation of the previous results. My guess is that this is because the estimator does not actually exercise the prefetcher, and so the prefetcher is a no-op for it, while it originally brought slowdown to our real-world use case. TL;DR: No changes with 0c23749, I still recommend the compute costs changes from #11925 (comment). The speedup from it is probably in parts that had never been measured by the estimator anyway, because it was too much contract-specific. |
Summary of the meeting with @akhi3030, @Longarithm and @pugachAG:
Considering this zulip thread, my assumption on So to sum up the next steps:
With this, this issue has become a bit of a hydra, with lots of new heads popping out. I'd expect the remaining time needed to do it all to be around 3-4 weeks. If we skip steps 5&6, it'd reduce by ~1 week (in exchange for relying on release testing to confirm it works), and probably the same if we skip step 3&4 (and instead charge arbitrarily more on read costs). This being said, this will be my first time adding a new cost, I'll have to hope the documentation we have is still up-to-date, even though it's more than a year old. For now I'll proceed with doing all the steps, but feel free to let me know if you want me to skip some of them — I think for similar cost changes we usually would have skipped steps 5&6 in particular, because memtrie should have little variance. |
Over the week-end I gave a bit more thought to the suggested cost changes, and I want to change the new TTN cost. One thing I had initially forgotten was, that (Write)TTN cost must also include the cost for writing the updated node to disk, after updating the leaf value. So we need to plan for more variance in the numbers, because WriteTTN must include disk access time, and disk access has large variance. We don't need to do the same for ReadCachedTrieNode, because we need to write back the node only once anyway. However, we don't need to handle the full latency that was handled before, because writes are batched at the end of the block processing. So the latency needs to be accounted for only once, and it already is in the Write*/Remove* costs. With this in mind, here are my new suggested cost changes. I basically made the 6G of TTN into 20G, in order to account for writeback variance better:
|
This looks like a reasonable proposal to me. |
If we plan to make changes to the |
good point, we will continue discussing and consider this. |
I just noticed an issue with reintroducing ReadTTN for memtrie reads. The details are in this zulip thread; it might result in a change in suggested costs changes. |
I also chatted with Akhi about This being said, the gains are not that good. For a random tg receipt, the reduction would lead to a ~2% throughput gain. The gains would be similar for two kaiching receipts. 2% is neither a big nor a small number for the improvement of the throughput of the entire chain. I can easily leave the cost reduction as-is (so, with the 2% throughput gains) until if/when we make changes to Regardless of the choice, as we're not touching any gas costs, it'll be just as easy to change @bowenwang1996 What do you prefer? |
Considering the zulip chat, I ended up removing |
Also, to copy the next steps from the zulip thread:
Currently, 1. is implemented by #12044, and this is just lacking review. As for whether doing steps 2-3 now is worth it right now or I can focus on other important stuff (more performance work), I'll unpop administrative backlog for now to let other people interject opinions, and reconsider after the administrative backlog is handled. |
As part of the evaluation of #12044, I computed a higher bound on the benefits ReadTTN could give us. On the real-world use cases with detailed data available here, the benefits could not exceed 10-20%, as this is the remaining cost of the overestimated read costs. I'd expect the benefits of just ReadTTN to be around 5-10% of the speed of the whole chain, because most likely, most nodes are at least at depth 12. This being said, the current 6G estimation for ReadTTN is probably also overestimated. If we also re-estimate ReadTTN, we might reach 10-15% of chain throughput improvements |
See #11925 for all the gory details. The major points are: - We currently overestimate `read_base` and `has_key_base` by 150G. This is to handle the fact that we don't actually have ReadTTN yet. Once we have ReadTTN, we can reduce these numbers to match estimator outputs again. But this will have to wait for a solution for RPC and archival nodes, that currently live on flat storage. The solution might be to enforce memtrie for them too, but it'd increase operational expenses. - Reduction of (Write)TTN is made possible by the fact that the critical path only has lots of (memtrie) reads, and a single write at the end of the block. The latency cost is accounted for in `write_base`, and RocksDB batches writes, so WriteTTN can be reasonably close to estimator results nowadays, even though we still have to take some variance in disk write throughput into account. - Once this lands, we should check whether blocks become limited by chunk witness size, or whether gas is still the limiting factor in most cases. - `ReadCachedTrieNode` was not reduced due to concerns about how it is actually implemented with memtrie. - We had to set some gas costs to `1`. This is because our current way of accounting for compute costs is to take the gas costs, divide by the gas cost, and multiply by the compute cost. So our current code does not support a gas cost of 0 with a non-zero compute costs, and changing that would require refactoring. - All raw estimator results and graphs from which the numbers were derived are available on #11925. cc @Ekleog-NEAR
With memtrie, we made our storage reads much faster. It already enabled stateless validation.
Now we want to lower at least compute costs, in order to also get single-shard performance improvements from that work.
In order to do that, we must retake ownership of parameters estimator, run it with the newest changes, and then consider what costs changes we should do.
The text was updated successfully, but these errors were encountered: