-
Notifications
You must be signed in to change notification settings - Fork 622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Locust performance with high TPS #11352
Comments
This should be the fix: aabf18b |
Small note: This fix uses |
Probably, such difference in ratio shows that in our case bottleneck for locust is computations to generate and sign a transaction, not sending it to the node. (But I didn't read yet what is actually the difference between fast and not fast |
TLDR: I don't surprised that generation of NEAR transactions is 10 times slower than "best case scenario" And for me it sounds reasonable, they say about 4k TPS in "best case scenario", so probably they just continuously sending thousands of identical requests like So probably their request generation takes 0 CPU (while for us CPU is the bottleneck for locust) and all CPU is provided for locust internals. Probably they also located the server on same Mac machine and made it as simple as possible, like just send back But maybe I'm wrong. |
All fair observations - I want to confirm this by running Locust under profiler. Here is a sample profile of
The 99% of the wall-clock time is split between two lines:
Which are both RPC requests to the node. The transaction signing takes 0.2% of wall-clock time which is non-zero but also not that significant. I think the next step is to understand the threading model that Locust is using - if these transaction submissions are executed on a single thread in a blocking fashion that will surely limit the throughput. Instead, we should be using something like asyncio here. |
To analyse this further, signing takes 2ms of wallclock time, so we can do around 500 of those per second in a Python process, which would be a bottleneck for a single worker setup. For 16 workers that should not be a problem though. It would be nice to improve the performance of signing, but that probably is not the root cause of limited throughput that we see with multiple shards. |
Here is another profile from a longer run with more users:
Here Signing on average takes 1ms which is similar to a previous run. The rest 20% is in I'll go ahead and try a few local optimizations to improve the single-thread performance. |
Andrei, thanks for your profiling! Can you please repeat this on this commit to check if (and how) my changes reduce time consumption by |
Yes, I plan to do this as a part of refactoring started in #11364. Concretely, I plan to add an API that allows to submit transaction without waiting for completion directly by the Locust user. |
We migrate away from `broadcast_tx_commit` and `broadcast_tx_async` which are now deprecated in favor of `send_tx`: https://docs.near.org/api/rpc/transactions I believe this PR should preserve the behavior that we had before and I plan to do any behavior changes in the future PRs. Tested by running: ```sh locust -H 127.0.0.1:3030 \ -f locustfiles/ft.py \ --funding-key=$KEY --headless -u 2000 -r 50 -t 120s ``` This is a part of work on improving the performance of Locust load test runner: #11352
We use it extensively in Locust benchmarks and it shows as a hotspot. A single signature takes around 1-2ms on my machine. PyNacl is a recommended replacement for python-ed25519 that is supposed to be 10-20 times faster for signatures. This is a part of #11352 I've followed this guide: https://github.com/warner/python-ed25519?tab=readme-ov-file#migrating-to-pynacl
We can now genereate 1000 TPS with a single Locust process at 0.8 CPU core usage, which should be good enough for our benchmarks. |
Our experiments show that a single Locust instance struggles to run with over 500 TPS. This is not expected as Locust documentation claims a single thread can handle 4000 TPS: https://docs.locust.io/en/2.27.0/increase-performance.html
We need to investigate and fix this discrepancy.
The text was updated successfully, but these errors were encountered: