Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: update the default per peer bandwidth limits #2390

Closed
wants to merge 1 commit into from

Conversation

evan-forbes
Copy link
Member

Overview

part of #2387

fair warning, this is technically using the metric versions of bandwidth, so 1 Gigabit = 1_000_000_000 bytes (power of 10 instead of a power of 2)

Checklist

  • New and updated code has appropriate documentation
  • New and updated code has new and/or updated testing
  • Required CI checks are passing
  • Visual proof for any user facing features like CLI or documentation updates
  • Linked issues closed with keywords

@evan-forbes evan-forbes self-assigned this Aug 30, 2023
@evan-forbes evan-forbes added the backport:v1.x PR will be backported automatically to the v1.x branch upon merging label Aug 30, 2023
@celestia-bot celestia-bot requested a review from a team August 30, 2023 17:24
@codecov-commenter
Copy link

Codecov Report

Merging #2390 (ca70f30) into main (97fc207) will decrease coverage by 0.01%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##             main    #2390      +/-   ##
==========================================
- Coverage   20.62%   20.61%   -0.01%     
==========================================
  Files         131      131              
  Lines       15266    15269       +3     
==========================================
  Hits         3148     3148              
- Misses      11816    11819       +3     
  Partials      302      302              
Files Changed Coverage Δ
app/default_overrides.go 14.54% <0.00%> (-0.41%) ⬇️

@@ -175,6 +183,9 @@ func DefaultConsensusConfig() *tmcfg.Config {
cfg.Consensus.TimeoutCommit = appconsts.TimeoutCommit
cfg.Consensus.SkipTimeoutCommit = false
cfg.TxIndex.Indexer = "null"

cfg.P2P.SendRate = int64(defaultBandwidth / cfg.P2P.MaxNumInboundPeers)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be:

Suggested change
cfg.P2P.SendRate = int64(defaultBandwidth / cfg.P2P.MaxNumInboundPeers)
cfg.P2P.SendRate = int64(defaultBandwidth / cfg.P2P.MaxNumOutboundPeers)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory yeah, but I think having a default where peers are sending more bandwidth than that peer is configured to receive is likely a bad idea. That being said, I haven't tried it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this being due to nodes by default having more inbound than outbound peers

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having a default where peers are sending more bandwidth than that peer is configured to receive is likely a bad idea.

The total amount of consumed bandwidth for upload and download would remain the same for a node, only the amount dedicated to its connected peer would change.

this being due to nodes by default having more inbound than outbound peers

I see, Is the concern about the symmetry in the send and receive rate per connection? Unless the two peers that are connected ensure parity in their count of inbound connections, disparities in their sending and receiving bandwidth could arise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, Is the concern about the symmetry in the send and receive rate per connection?

yeah exactly. Having a default send rate per peer that is higher than the receive rate naively seemed like a bad idea. Our default OutboundPeers max is 10, where our default inbound is 40, therefore we could end up sending 4 times as much data as our peer is willing to receive.

app/default_overrides.go Show resolved Hide resolved
@musalbas
Copy link
Member

musalbas commented Aug 30, 2023

After further thought I think there may be disadvantages to increasing this, namely the fact that Tendermint sends mempool txes in a constant loop even though they already sent them before, so nodes will be constantly using 1gbit even if they don't need to, we should take a step back and consider the issue holistically

@evan-forbes evan-forbes marked this pull request as draft August 30, 2023 19:39
@evan-forbes
Copy link
Member Author

Tendermint sends mempool txes in a constant loop even though they already sent them before

good point, converted to a draft for now. we can definitely wait until we make CAT the default

@musalbas
Copy link
Member

musalbas commented Sep 8, 2023

What if we only doubled RecvRate and SendRate to 1MB each for now. The fact that the mempool gossiping isn't efficient is unfortunate but it shouldn't stop us from under-utilizing the validators' bandwidth, so I think increasing it to 1MB - which would use 400mbit/s of bandwidth, is a good first step/compromise to see how the validators behave

Any thoughts @liamsi?

@cmwaters
Copy link
Contributor

Tendermint sends mempool txes in a constant loop even though they already sent them before

good point, converted to a draft for now. we can definitely wait until we make CAT the default

celestiaorg/celestia-core#1089 would fix this

@@ -28,6 +28,14 @@ import (
coretypes "github.com/tendermint/tendermint/types"
)

const (
// oneGigabit 1Gbit == 1 billion bits
oneGigabit = 1_000_000_000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale of this number

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc, this was the bandwidth reccomendations for a consensus node (side note this should be gigabyte I think, since we're converting to bit below)

I could be wrong though, in which case we should update the default bandwidth expectation below

@cmwaters
Copy link
Contributor

What if we only doubled RecvRate and SendRate to 1MB each for now

Isn't it current 5MB/s so doubling would be 10MB?

@musalbas
Copy link
Member

Afaik its 512KB/s in the config.toml last time I checked

@evan-forbes
Copy link
Member Author

Afaik its 512KB/s in the config.toml last time I checked

the default is currently 5MB/s. This PR sets it to 12.5MB.

# Rate at which packets can be sent, in bytes/second
send_rate = 5120000

# Rate at which packets can be received, in bytes/second
recv_rate = 5120000

@evan-forbes
Copy link
Member Author

currently this PR tries to use the default outboud peers to determine bandwidth limits per peer, but I don't think that's accurate since we have 40 inbound.

@musalbas
Copy link
Member

Is this from a recent default config? I vividly remember it being 512KB/s and that was also what peers were throttled to 512KB/s. It says 512KB/s here: https://github.com/celestiaorg/celestia-core/blob/86e67b726f017e6c05c4a627c137ba78667dbb66/p2p/conn/connection.go#L43

@cmwaters
Copy link
Contributor

The default send and recv rate for MConnConfig get overwritten by the P2PConfig here: https://github.com/celestiaorg/celestia-core/blob/86e67b726f017e6c05c4a627c137ba78667dbb66/p2p/switch.go#L38

@evan-forbes evan-forbes removed their assignment Nov 12, 2023
@evan-forbes
Copy link
Member Author

closing this draft until further experiments have been done. It also make incorrect assumptions on inbound and outbound peers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:v1.x PR will be backported automatically to the v1.x branch upon merging
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants