Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sudden drop in TPS around 14k transactions (Quorum IBFT) #479

Closed
drandreaskrueger opened this issue Aug 13, 2018 · 20 comments
Closed

Sudden drop in TPS around 14k transactions (Quorum IBFT) #479

drandreaskrueger opened this issue Aug 13, 2018 · 20 comments
Labels

Comments

@drandreaskrueger
Copy link

drandreaskrueger commented Aug 13, 2018

IBFT seems to max out around 200 TPS when run in the 7 nodes example.

--> see these results

However, the original publication is talking about 800 TPS with Istanbul BFT. How did they do it?

Any ideas how to get this faster?

Thanks!

@drandreaskrueger
Copy link
Author

drandreaskrueger commented Aug 16, 2018

New benchmark. >400 TPS!

On a dockerized crux-quorum with 4 nodes.

Surprise: web3 turned out to be a huge bottleneck now!
When not using web3 transaction calls but direct RPC calls, I see considerable TPS rate improvements (from today's previous record ~273 TPS):

https://gitlab.com/electronDLT/chainhammer/blob/master/quorum-IBFT.md#direct-rpc-call-instead-of-web3-call

initially

over 450 TPS !!!

(but only during the first ~14,000 transactions, then it drops to ~270 TPS, mysteriously. Any ideas, anyone?)


https://gitlab.com/electronDLT/chainhammer/raw/master/chainreader/img/istanbul-crux-docker-1s-gas20mio-RPC_run8_tps-bt-bs-gas_blks28-93.png
diagrams: https://gitlab.com/electronDLT/chainhammer/blob/master/chainreader/img/istanbul-crux-docker-1s-gas20mio-RPC_run8_tps-bt-bs-gas_blks28-93.png

@fixanoid
Copy link
Contributor

fixanoid commented Aug 17, 2018

Hey @drandreaskrueger looks great. As to the drop off at 14k txns, since you are already tinkering with the cli options for geth, please look into these as well:

PERFORMANCE TUNING OPTIONS:
  --cache value            Megabytes of memory allocated to internal caching (default: 1024)
  --cache.database value   Percentage of cache memory allowance to use for database io (default: 75)
  --cache.gc value         Percentage of cache memory allowance to use for trie pruning (default: 25)
  --trie-cache-gens value  Number of trie node generations to keep in memory (default: 120)

These are from: https://github.com/ethereum/go-ethereum/wiki/Command-Line-Options. Also, for the report, might be good to also keep track of queued txns.

@drandreaskrueger
Copy link
Author

Thanks a lot.

I have now tried

--cache 4096 --trie-cache-gens 1000

but no change in behavior. Sudden TPS drop around 14k transactions, look at TPS_current:

block 108 | new #TX 415 / 1000 ms = 415.0 TPS_current | total: #TX 9503 / 22.4 s = 424.9 TPS_average
block 109 | new #TX 437 / 1000 ms = 437.0 TPS_current | total: #TX 9940 / 23.3 s = 426.4 TPS_average
block 110 | new #TX 516 / 1000 ms = 516.0 TPS_current | total: #TX 10456 / 24.6 s = 425.7 TPS_average
block 111 | new #TX 509 / 1000 ms = 509.0 TPS_current | total: #TX 10965 / 25.2 s = 434.6 TPS_average
block 112 | new #TX 411 / 1000 ms = 411.0 TPS_current | total: #TX 11376 / 26.2 s = 434.3 TPS_average
block 113 | new #TX 480 / 1000 ms = 480.0 TPS_current | total: #TX 11856 / 27.4 s = 432.0 TPS_average
block 114 | new #TX 509 / 1000 ms = 509.0 TPS_current | total: #TX 12365 / 28.4 s = 435.4 TPS_average
block 115 | new #TX 381 / 1000 ms = 381.0 TPS_current | total: #TX 12746 / 29.1 s = 438.7 TPS_average
block 116 | new #TX 411 / 1000 ms = 411.0 TPS_current | total: #TX 13157 / 30.3 s = 434.3 TPS_average
block 117 | new #TX 482 / 1000 ms = 482.0 TPS_current | total: #TX 13639 / 31.3 s = 436.1 TPS_average
block 118 | new #TX 507 / 1000 ms = 507.0 TPS_current | total: #TX 14146 / 32.5 s = 434.7 TPS_average
block 119 | new #TX 250 / 1000 ms = 250.0 TPS_current | total: #TX 14396 / 33.2 s = 433.7 TPS_average
block 120 | new #TX 211 / 1000 ms = 211.0 TPS_current | total: #TX 14607 / 34.1 s = 427.9 TPS_average
block 121 | new #TX 282 / 1000 ms = 282.0 TPS_current | total: #TX 14889 / 35.4 s = 420.8 TPS_average
block 122 | new #TX 288 / 1000 ms = 288.0 TPS_current | total: #TX 15177 / 36.3 s = 417.7 TPS_average
block 123 | new #TX 294 / 1000 ms = 294.0 TPS_current | total: #TX 15471 / 37.0 s = 418.1 TPS_average
block 124 | new #TX 280 / 1000 ms = 280.0 TPS_current | total: #TX 15751 / 38.3 s = 411.6 TPS_average
block 125 | new #TX 256 / 1000 ms = 256.0 TPS_current | total: #TX 16007 / 39.2 s = 408.1 TPS_average
block 126 | new #TX 251 / 1000 ms = 251.0 TPS_current | total: #TX 16258 / 40.2 s = 404.4 TPS_average
block 127 | new #TX 282 / 1000 ms = 282.0 TPS_current | total: #TX 16540 / 41.2 s = 401.7 TPS_average
block 128 | new #TX 288 / 1000 ms = 288.0 TPS_current | total: #TX 16828 / 42.4 s = 396.6 TPS_average
block 129 | new #TX 220 / 1000 ms = 220.0 TPS_current | total: #TX 17048 / 43.4 s = 393.1 TPS_average
block 130 | new #TX 277 / 1000 ms = 277.0 TPS_current | total: #TX 17325 / 44.3 s = 391.0 TPS_average

@drandreaskrueger drandreaskrueger changed the title IBFT - 200 TPS max ? Sudden drop in TPS around 14k transactions (Quorum IBFT) Aug 20, 2018
@drandreaskrueger
Copy link
Author

same observation also in geth v1.8.13 (not only in quorum)

ethereum/go-ethereum#17447

@drandreaskrueger
Copy link
Author

any new ideas about that?

@drandreaskrueger
Copy link
Author

You can now super-easily reproduce my results, in less than 10 minutes, with my Amazon AMI image:

https://gitlab.com/electronDLT/chainhammer/blob/master/reproduce.md#readymade-amazon-ami

@drandreaskrueger
Copy link
Author

any new ideas about that?

@vasa-develop
Copy link

@drandreaskrueger @fixanoid Any updates on why the TPS drop occurs around 14K?

Thanks :)

@vasa-develop
Copy link

@drandreaskrueger Is this result for AWS consistent, or it was a one-time feat?
peak TPS_average is 536 TPS, final TPS_average is 524 TPS.

@drandreaskrueger
Copy link
Author

Last time I checked, the problem was still there.

But it seems to be caused upstream, because look at this:

ethereum/go-ethereum#17447 (comment)

It happens in geth too!

@drandreaskrueger
Copy link
Author

Perhaps you can help them to find the cause?

@jpmsam
Copy link
Contributor

jpmsam commented Jan 15, 2019

That's a good idea. We'll look into it too after the upgrade to 1.8.18.

@drandreaskrueger
Copy link
Author

Cool, thanks.

There will soon be a whole new version of chainhammer, with much more automation.

Stay tuned ;-)

@vasa-develop
Copy link

@drandreaskrueger is the AWS result with the web3 lib? Did you try with direct RPC calls(as you mentioned that web3 causes a lot of damage to the TPS)?
If not I will give it a try.

@drandreaskrueger
Copy link
Author

I had tried both, via web3 and via direct RPC calls. The latter was usually faster, so I have done all later measurements with RPC calls.

The old code is still there though, and the switch is here, so you can simply try yourself: https://github.com/drandreaskrueger/chainhammer/blob/223fda085aad53c1cbf4c46c336ad04c2348da82/hammer/config.py#L40-L41

You can also read this:
https://github.com/drandreaskrueger/chainhammer/blob/master/docs/FAQ.md

it links into the relevant code pieces

@drandreaskrueger
Copy link
Author

drandreaskrueger commented Jan 18, 2019

@jpmsam

after the upgrade to 1.8.18.

Oh, oops - I have been missing a lot then. But why v1.8.18 - your release page talks about 2.2.1?

Still doing all my benchmarks with a Quorum version that calls itself Geth/v1.7.2-stable-d7e3ff5b/linux-amd64/go1.10.1 ...

... because I am benchmarking quorum via the excellent dockerized 4 nodes setup created by blk-io, see here which is less heavy than your vagrant virtualbox 7 nodes setup.
I suggest you have a look at that dockerized version, perhaps you can publish something similar. Or do you have a dockerized Quorum setup by now?

For all my benchmarking, I could find dockerized versions of Geth, Parity, and Quorum - and blk-io/crux is the one I am using for quorum.

@drandreaskrueger
Copy link
Author

chainhammer v55

I have just published a brand new version v55: https://github.com/drandreaskrueger/chainhammer/#quickstart

Instead of installing everything to your main work computer, better use (a virtualbox Debian/Ubuntu installation or) my Amazon AMI to spin up a t2.medium machine, see docs/cloud.md#readymade-amazon-ami.

Then all you need to do is:

networks/quorum-configure.sh
CH_TXS=50000 CH_THREADING="threaded2 20" ./run.sh "YourNaming-Quorum" quorum

and afterwards check results/runs/ to find an autogenerated results page, with time series diagrams.

Hope that helps! Keep me posted please.

@jio-gl
Copy link

jio-gl commented May 26, 2020

Looks great. What is the performance with 100 nodes?

@drandreaskrueger
Copy link
Author

drandreaskrueger commented Jun 22, 2020

What is the performance with 100 nodes?

Just try it out.

I am importing the /blk-io_crux/docker/quorum-crux project here:
https://github.com/drandreaskrueger/chainhammer/blob/49a7d78543b9f26e9839286c7f8c73851a18ca52/networks/quorum-configure.sh#L3-L12

If you look into their details, extending this from 4 nodes to 100 nodes looks doable, just tedious:
https://github.com/blk-io/crux/blob/eeb63a91b7eda0180c8686f819c0dd29c0bc4d46/docker/quorum-crux/docker-compose-local.yaml

It would have to be a very large machine. And I would not expect huge changes. This type of distributed ledger technology doesn't get faster by plugging in more nodes, no?

@varasev
Copy link

varasev commented Jul 23, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants