-
Notifications
You must be signed in to change notification settings - Fork 632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Show that locust can saturate the gas limits and describe chain behavior #9118
Comments
I'm still unable to get blocks with >1000Tgas :( So far I resolved 2 problem that prevented me from hitting the gas limit:
But even with that, I am not quite able to saturate even a single shard the way I was hoping to do. I ran a 4 shard, 4 node localnet. And locust with 6000 users spanning across 32 workers, each with 4 separate FT contracts. The 32 workers send their request to 2 different RPC nodes. This setup peaked around 900 TPS, with only about 75% of gas capacity on each shard. (evenly distributed) Note: 900TPS peak throughput corresponds to 900 * ~5 Tgas = 4500Tgas per second. With a block time of 1.3s that means 4500Tgas / 1.3s = 3461 Tgas / block which is about 86% of the 4000Tgas capacity. Looking at the response time going up significantly starting at around 3800 users, it suggests that we are hitting a bottleneck there. But this has only about 750 TPS, far below the 1050TPS I want to see. So I need to figure out what the current bottleneck is. Trying more than 2 RPC nodes next. cc @akhi3030 maybe you have some ideas regarding the bottleneck, or see flaws in my reasoning? |
I've repeated the experiment with more RPC nodes - same results. Then I run with just a single shard. (thanks @akhi3030 for the idea!) Then I was hitting a limit at around 900 users, with again chunks never filling up. They are stuck at around 750Tgas again. But after that, I figured out one big factor: Compute Costs! FT calls are doing a decent amount of storage requests, which means they are charged a higher compute cost than the gas cost. Removing compute cost parameters gives me almost full chunks, but sadly still not quite. With ~4200 users I'm getting close to ~900 TPS with still a mostly stable median response time of 2.5s. Next week I'll integrate it with Prometheus and Grafana to get more data about what the nodes are doing. |
@jakmeier you mentioned that you used 2 rpc nodes for the test and I wonder whether that is enough. Would it help if there are more rpc nodes to distribute the rpc request load? |
yes, that was one experiemnt
This was with 4 RPC nodes. And I even test with a single shard. I think that should be enough to rule out this bottleneck in this particular setup. But I think for the final benchmark, it would be good to have at least as many RPC nodes as number of shards. |
While I'm working on running this on top of testnet state, @Akashin has been able to saturate chunks with gas already last week: #8920 (comment) But that's with larger receipts. We still want to show it with many small receipts, too. |
Filling chunks to the limit using locust load has been demonstrated multiple time by now. With the new metrics, it is also easy to observe. Hence I am going to mark this issue as completed. A few comments on how to check for "full" chunks:
As an example, below are the compute cost heatmaps for all 4 shards, where only shard 3 is at capacity. |
Running locust based loadtests as described in #8999, we want to observe a test where at least one shard (ideally all shards) have full chunks for a extended period of time.
This will prove that gas is a bottleneck before any bottlenecks of the test setup prevent more traffic.
And it will show what we should expect in a congestion case today.
This is a per-requisite for #8920.
The text was updated successfully, but these errors were encountered: