Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new RPC stress testing tool (lotus-bench rpc) with rich reporting #10761

Merged
merged 6 commits into from
May 30, 2023

Conversation

fridrik01
Copy link
Contributor

@fridrik01 fridrik01 commented Apr 26, 2023

Fixes: #10752
Fixes: https://github.com/filecoin-project/fvm-pm/issues/494.

Context

We need a more elaborate tool to stress test our RPC methods in order address and fix the reported performance issues (example #10670, #10539, #10540, #10541, #10663).

This PR implements such tool (lotus-bench rpc) and has the following features:

  • Can query each method both sequentially and concurrently
  • Supports rate limiting
  • Can query multiple different endpoints at once (supporting different concurrency level and rate limiting for each method)
  • Gives a nice reporting summary of the stress testing of each method (including latency distribution, histogram, errors (http and json error codes) and more)
  • Supports --watch option which prints out intermediate progress which is useful for long running benchmark
  • Easy to use

NOTE: Right now everything is within a single source file (rpc.go) but can be easily refactored and split into multiple files and moved into its own package.

NOTE: To support any type of PARAMS we need to be able to pass , from the command line. This however requires an upgrade to urfave which added support for that via flag DisableSliceFlagSeparator. However, upgrading urface brings in regressions in how it generates --help output and does also not support displaying categories in subcommands. I raised this issue in urfave and will update the urfave dependency once that is fixed and then explicitly set the DisableSliceFlagSeparator so we can support any type of PARAMS

Test plan

Build:

make lotus-bench

Stress test eth_chainId using default options :

lotus-bench rpc --method='eth_chainId'
[eth_chainId]:
- Options:
  - concurrency: 10
  - params: []
  - qps: 0
- Total Requests: 3920235
- Total Duration: 59992ms
- Requests/sec: 65345.869940
- Avg latency: 0ms
- Median latency: 0ms
- Latency distribution:
    10.00% in 0ms
    50.00% in 0ms
    90.00% in 0ms
    95.00% in 0ms
    99.00% in 1ms
    99.90% in 1ms
- Histogram:
     0-1ms|  3918201|################################################################################################### (99.95%)
     1-2ms|     1044| (0.03%)
     2-3ms|      421| (0.01%)
     3-4ms|      265| (0.01%)
     4-5ms|      132| (0.00%)
     5-6ms|       84| (0.00%)
     6-7ms|       33| (0.00%)
     7-8ms|       29| (0.00%)
     8-9ms|       12| (0.00%)
    9-14ms|       14| (0.00%)
- Status codes:
    [200]: 3920235
- Errors (top 10):
    [nil]: 3920235

Now lets try stress testing the eth_getTransactionCount rpc method for 120 seconds using the specified rpc method params:

lotus-bench rpc --duration=120s --method='eth_getTransactionCount:::["0xd4c70007F3F502f212c7e6794b94C06F36173B36", "latest"]' 
[eth_getTransactionCount]:
- Options:
  - concurrency: 10
  - params: ["0xd4c70007F3F502f212c7e6794b94C06F36173B36", "latest"]
  - qps: 0
- Total Requests: 3294912
- Total Duration: 119992ms
- Requests/sec: 27459.420012
- Avg latency: 0ms
- Median latency: 0ms
- Latency distribution:
    10.00% in 0ms
    50.00% in 0ms
    90.00% in 0ms
    95.00% in 0ms
    99.00% in 0ms
    99.90% in 9ms
- Histogram:
       0-16ms|  3294108|################################################################################################### (99.98%)
      16-32ms|      467| (0.01%)
      32-48ms|       90| (0.00%)
      48-64ms|       64| (0.00%)
      64-80ms|       43| (0.00%)
      80-96ms|       40| (0.00%)
     96-112ms|       49| (0.00%)
    112-128ms|       16| (0.00%)
    128-144ms|       10| (0.00%)
    144-165ms|       25| (0.00%)
- Status codes:
    [200]: 3294912
- Errors (top 10):
    [nil]: 3294912

Now lets try stress testing both the eth_chainId and eth_getTransactionCount at the same time

  • eth_chainId will be stress tested using 5 concurrent workers limited to 1000 queries per second, and
  • eth_getTransactionCount will be stress tested using 10 concurrent workers limited ot 2000 queries per second:
lotus-bench rpc --duration=10s --method='eth_chainId:5:1000'  --method='eth_getTransactionCount:10:2000:["0xd4c70007F3F502f212c7e6794b94C06F36173B36", "latest"]' 
[eth_chainId]:
- Options:
  - concurrency: 5
  - params: []
  - qps: 1000
- Total Requests: 9447
- Total Duration: 10000ms
- Requests/sec: 944.689930
- Avg latency: 0ms
- Median latency: 0ms
- Latency distribution:
    10.00% in 0ms
    50.00% in 0ms
    90.00% in 0ms
    95.00% in 0ms
    99.00% in 0ms
    99.90% in 2ms
- Histogram:
      0-2ms|  9438|################################################################################################### (99.90%)
      2-4ms|     3| (0.03%)
      4-6ms|     0| (0.00%)
      6-8ms|     1| (0.01%)
     8-10ms|     0| (0.00%)
    10-12ms|     0| (0.00%)
    12-14ms|     0| (0.00%)
    14-16ms|     2| (0.02%)
    16-18ms|     1| (0.01%)
    18-20ms|     2| (0.02%)
- Status codes:
    [200]: 9447
- Errors (top 10):
    [nil]: 9447

[eth_getTransactionCount]:
- Options:
  - concurrency: 10
  - params: ["0xd4c70007F3F502f212c7e6794b94C06F36173B36", "latest"]
  - qps: 2000
- Total Requests: 11415
- Total Duration: 10000ms
- Requests/sec: 1141.477942
- Avg latency: 0ms
- Median latency: 0ms
- Latency distribution:
    10.00% in 0ms
    50.00% in 0ms
    90.00% in 2ms
    95.00% in 6ms
    99.00% in 14ms
    99.90% in 50ms
- Histogram:
      0-5ms|  10722|############################################################################################# (93.93%)
     5-10ms|    424|### (3.71%)
    10-15ms|    187|# (1.64%)
    15-20ms|     44| (0.39%)
    20-25ms|     14| (0.12%)
    25-30ms|      4| (0.04%)
    30-35ms|      0| (0.00%)
    35-40ms|      0| (0.00%)
    40-45ms|      0| (0.00%)
    45-55ms|     20| (0.18%)
- Status codes:
    [200]: 11415
- Errors (top 10):
    [nil]: 11415

Test that errors are reported correctly for both http and json errors. In this example the params given to eth_estimateGas are invalid so a json response is returned with an error message. Also, after running this for 2sec I killed lotus and it correctly reported then http errors for the remaining requests:

lotus-bench rpc --method='eth_estimateGas:1:1:[{"to": "0x7B90337f65fAA2B2B8ed583ba1Ba6EB0C9D7eA44"}]' --duration=5s
- Options:
  - concurrency: 1
  - params: [{"to": "0x7B90337f65fAA2B2B8ed583ba1Ba6EB0C9D7eA44"}]
  - qps: 1
- Total Requests: 5
- Total Duration: 5000ms
- Requests/sec: 0.999891
- Avg latency: 560ms
- Median latency: 0ms
- Latency distribution:
    10.00% in 0ms
    50.00% in 0ms
    90.00% in 1633ms
    95.00% in 1633ms
    99.00% in 1633ms
    99.90% in 1633ms
- Histogram:
        0-163ms|  3|############################################################ (60.00%)
      163-326ms|  0| (0.00%)
      326-489ms|  0| (0.00%)
      489-652ms|  0| (0.00%)
      652-815ms|  0| (0.00%)
      815-978ms|  0| (0.00%)
     978-1141ms|  0| (0.00%)
    1141-1304ms|  1|#################### (20.00%)
    1304-1467ms|  0| (0.00%)
    1467-1633ms|  1|#################### (20.00%)
- Status codes:
    [200]: 2
- Errors (top 10):
    [HTTP error: Post "http://127.0.0.1:1234/rpc/v1": dial tcp 127.0.0.1:1234: connect: connection refused]: 3
    [JSON error: code:1, message:failed to estimate gas: message execution failed: exit 33, revert reason: none, vm error: message failed with backtrace:00: f02064481 (method 3844450837) -- contract reverted (33) (RetCode=33)]: 2

@fridrik01 fridrik01 force-pushed the 10752-bench-rpc branch 3 times, most recently from fe614fc to 9cb1ef2 Compare April 26, 2023 12:07
@fridrik01 fridrik01 requested a review from snissn April 26, 2023 12:07
@fridrik01 fridrik01 marked this pull request as ready for review April 26, 2023 12:25
@fridrik01 fridrik01 requested a review from a team as a code owner April 26, 2023 12:25
@fridrik01 fridrik01 force-pushed the 10752-bench-rpc branch 7 times, most recently from 1161d5e to 77b04d1 Compare April 26, 2023 19:46
@snissn
Copy link
Contributor

snissn commented May 2, 2023

This looks good to me, the code works, and is very helpful for debugging. @arajasek are there any additional steps or checks we should take before this is approved?

@snissn
Copy link
Contributor

snissn commented May 2, 2023

@fridrik01
Copy link
Contributor Author

fridrik01 commented May 3, 2023

@fridrik01 there is a failing test -- https://app.circleci.com/pipelines/github/filecoin-project/lotus/28376/workflows/553452e4-b449-4ff4-b750-4eb4cca0b41a/jobs/950691

Ok, the upgrade of urfave/cli/v2 changed the --help output so I needed to run make docsgen-cli to update them. The changes I can see in the output are the following:

  • Categories for subcommands for some reason are not shown anymore. I only see this used for lotus client --help though so it may not be a big issue
  • Options are now sorted by the order they are added in the code instead of by name (which should actually be better IMO.

Copy link
Contributor

@magik6k magik6k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, just a few non-blocking nitpicks.

Not sure why we need to update urfave/cli here, it does seem to break groups in helptext - that should be either fixed or we should drop the update from this PR.

documentation/en/cli-lotus.md Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
cmd/lotus-bench/rpc.go Outdated Show resolved Hide resolved
cmd/lotus-bench/rpc.go Outdated Show resolved Hide resolved
cmd/lotus-bench/rpc.go Outdated Show resolved Hide resolved
cmd/lotus-bench/rpc.go Outdated Show resolved Hide resolved
@fridrik01
Copy link
Contributor Author

I have removed the urfave upgdare from this PR, it does mean that we don't support bencmarking all RPC methods but we can at least then land this and wait for this being fixed upstream in urfave (issue here and fix here).

cc: @magik6k @snissn

@fridrik01 fridrik01 requested a review from a team May 27, 2023 10:52
fridrik01 added 6 commits May 27, 2023 10:55
This benchmark is designed to stress test the rpc methods of a lotus node so that we can simulate real world usage and measure the performance of rpc methods on the node.

This benchmark has the following features:
* Can query each method both sequentially and concurrently
* Supports rate limiting
* Can query multiple different endpoints at once (supporting different concurrency level and rate limiting for each method)
* Gives a nice reporting summary of the stress testing of each method (including latency distribution, histogram and more)
* Easy to use

To use this benchmark you must specify the rpc methods you want to test using the --method options, the format of it is:

  --method=NAME[:CONCURRENCY][:QPS][:PARAMS] where only METHOD is required.

Here are some real examples:
  lotus-bench rpc --method='eth_chainId' // run eth_chainId with default concurrency and qps
  lotus-bench rpc --method='eth_chainId:3'  // override concurrency to 3
  lotus-bench rpc --method='eth_chainId::100' // override to 100 qps while using default concurrency
  lotus-bench rpc --method='eth_chainId:3:100' // run using 3 workers but limit to 100 qps
  lotus-bench rpc --method='eth_getTransactionCount:::["0xd4c70007F3F502f212c7e6794b94C06F36173B36", "latest"]' // run using optional params while using default concurrency and qps
  lotus-bench rpc --method='eth_chainId' --method='eth_getTransactionCount:10:0:["0xd4c70007F3F502f212c7e6794b94C06F36173B36", "latest"]' // run multiple methods at once`,

Fixes: #10752
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a stress testing harness for Testing RPC optimisations
3 participants