-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new RPC stress testing tool (lotus-bench rpc) with rich reporting #10761
Conversation
fe614fc
to
9cb1ef2
Compare
1161d5e
to
77b04d1
Compare
This looks good to me, the code works, and is very helpful for debugging. @arajasek are there any additional steps or checks we should take before this is approved? |
Ok, the upgrade of
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good, just a few non-blocking nitpicks.
Not sure why we need to update urfave/cli here, it does seem to break groups in helptext - that should be either fixed or we should drop the update from this PR.
b2592d4
to
120531b
Compare
120531b
to
fc309a5
Compare
I have removed the urfave upgdare from this PR, it does mean that we don't support bencmarking all RPC methods but we can at least then land this and wait for this being fixed upstream in urfave (issue here and fix here). |
This benchmark is designed to stress test the rpc methods of a lotus node so that we can simulate real world usage and measure the performance of rpc methods on the node. This benchmark has the following features: * Can query each method both sequentially and concurrently * Supports rate limiting * Can query multiple different endpoints at once (supporting different concurrency level and rate limiting for each method) * Gives a nice reporting summary of the stress testing of each method (including latency distribution, histogram and more) * Easy to use To use this benchmark you must specify the rpc methods you want to test using the --method options, the format of it is: --method=NAME[:CONCURRENCY][:QPS][:PARAMS] where only METHOD is required. Here are some real examples: lotus-bench rpc --method='eth_chainId' // run eth_chainId with default concurrency and qps lotus-bench rpc --method='eth_chainId:3' // override concurrency to 3 lotus-bench rpc --method='eth_chainId::100' // override to 100 qps while using default concurrency lotus-bench rpc --method='eth_chainId:3:100' // run using 3 workers but limit to 100 qps lotus-bench rpc --method='eth_getTransactionCount:::["0xd4c70007F3F502f212c7e6794b94C06F36173B36", "latest"]' // run using optional params while using default concurrency and qps lotus-bench rpc --method='eth_chainId' --method='eth_getTransactionCount:10:0:["0xd4c70007F3F502f212c7e6794b94C06F36173B36", "latest"]' // run multiple methods at once`, Fixes: #10752
fc309a5
to
e1b69f8
Compare
Fixes: #10752
Fixes: https://github.com/filecoin-project/fvm-pm/issues/494.
Context
We need a more elaborate tool to stress test our RPC methods in order address and fix the reported performance issues (example #10670, #10539, #10540, #10541, #10663).
This PR implements such tool (
lotus-bench rpc
) and has the following features:--watch
option which prints out intermediate progress which is useful for long running benchmarkNOTE: Right now everything is within a single source file (
rpc.go
) but can be easily refactored and split into multiple files and moved into its own package.NOTE: To support any type of PARAMS we need to be able to pass
,
from the command line. This however requires an upgrade to urfave which added support for that via flagDisableSliceFlagSeparator
. However, upgrading urface brings in regressions in how it generates --help output and does also not support displaying categories in subcommands. I raised this issue in urfave and will update the urfave dependency once that is fixed and then explicitly set theDisableSliceFlagSeparator
so we can support any type of PARAMSTest plan
Build:
Stress test
eth_chainId
using default options :Now lets try stress testing the
eth_getTransactionCount
rpc method for 120 seconds using the specified rpc method params:Now lets try stress testing both the
eth_chainId
andeth_getTransactionCount
at the same timeeth_chainId
will be stress tested using 5 concurrent workers limited to 1000 queries per second, andeth_getTransactionCount
will be stress tested using 10 concurrent workers limited ot 2000 queries per second:Test that errors are reported correctly for both http and json errors. In this example the params given to
eth_estimateGas
are invalid so a json response is returned with an error message. Also, after running this for 2sec I killed lotus and it correctly reported then http errors for the remaining requests: