Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile, benchmark, and add more load tests for portions of the p2p stack #1162

Closed
evan-forbes opened this issue Jan 2, 2024 · 1 comment · Fixed by #1199
Closed

Profile, benchmark, and add more load tests for portions of the p2p stack #1162

evan-forbes opened this issue Jan 2, 2024 · 1 comment · Fixed by #1199
Assignees
Labels
WS: Big Blonks 🔭 Improving consensus critical gossiping protocols

Comments

@evan-forbes
Copy link
Member

evan-forbes commented Jan 2, 2024

The MConnection is used as a final queue for all incoming and outgoing packets. For this reason, it seems at least within the realm of possibility that it plays a role of what I refer to as "The Syndrome", which is when we see validators unable to reach consensus despite bandwidth or compute limitations not being hit.

To better understand if the MConnection is playing a role in events that trigger "The Syndrome", such as the inscription events a few weeks ago, we should create simple load unit tests/benchmarks, while also profiling these benchmarks to see if there are any unexpected effective mutexes. The same goes for other low level transport logic, such as the MultiplexTransport.

@evan-forbes evan-forbes self-assigned this Jan 2, 2024
@evan-forbes evan-forbes changed the title Profile, benchmark, and add more load test for MConnection Profile, benchmark, and add more load tests for MConnection Jan 2, 2024
@evan-forbes evan-forbes changed the title Profile, benchmark, and add more load tests for MConnection Profile, benchmark, and add more load tests for portions of the p2p stack Jan 5, 2024
@evan-forbes evan-forbes removed their assignment Jan 8, 2024
@evan-forbes evan-forbes added the WS: Big Blonks 🔭 Improving consensus critical gossiping protocols label Jan 9, 2024
@staheri14
Copy link
Contributor

I have been scrutinizing the code to pinpoint potential bottlenecks (outlined below) that could impede message transmission (on the sender side), thereby hindering network efficiency and increasing block time. I will then conduct various test scenarios (in the upcoming PRs) based on these findings aimed at stretching these bottlenecks to their limits. By measuring the time taken for each, I will determine which elements are most responsible for slowing down message transmission. Subsequently, I will conduct a similar analysis on the receiving flow.

  • In MConnection, messages are transmitted using channel IDs. Messages for each channel ID are placed in a queue, with every channel ID having its own associated queue. If the method Send is called, there is a defaultSendTimeout. If the messages are not successfully queued within this timeout period, the queuing will fail. These queues have the potential to become bottlenecks in a highly loaded network.
  • Messages are prioritized for sending based on their channel priority. If this prioritization is not properly enforced, important messages may not be sent in a timely manner, potentially hindering the progress of consensus as intended.
  • All messages, irrespective of their channel, are sent through the MConnection's bufConnWriter. They are buffered prior to being sent to the remote party. If the buffer isn't full yet and contains an important message, like a validator vote, it may not be sent until the buffer is full. It's important to note that a ping-pong protocol operates between the two nodes, which ensures that the connection’s buffer is flushed at least after each ping interval. This interval is set to the defaultPingInterval, which is 60 second. Additionally, there is a flushTimer logic that clears data in the buffer every FlushThrottleTimeout. Therefore, data is flushed at the earliest occurrence of either the defaultPingInterval or the FlushThrottleTimeout.
  • The MConnection's send and receive rate also play an important role and may become a bottleneck when the network load is high.

staheri14 added a commit that referenced this issue Jan 19, 2024
A PR towards #1162 

This PR introduces a comprehensive benchmark suite for MConnection,
aimed at pinpointing performance bottlenecks. The tests evaluate the
following aspects:

- Evaluating send queue capacity impact on the message transmission
delay
- Assessing whether MConnection can fully utilize its maximum bandwidth 
- Testing performance beyond available bandwidth i.e., send and receive
rates. This is to determine how the system behaves when message rates
surpass the available bandwidth.

So far, no significant bottlenecks have been identified. In a follow-up
PR, I plan to adjust additional parameters and will share the results
and insights gained from these tests.
staheri14 added a commit that referenced this issue Jan 24, 2024
Another PR towards #1162
The aim of this benchmark is to assess the impact of the size of
individual messages, particularly in relation to the
MaxPacketMsgPayloadSize in the overall performance of MConnection as
well as the ability to utilize the maximum bandwidth / send rate.
staheri14 added a commit that referenced this issue Jan 29, 2024
…1189)

This PR implements a benchmark test to evaluate how channel IDs
influence maximum bandwidth utilization.
Part of #1162
staheri14 added a commit that referenced this issue Feb 13, 2024
rootulp pushed a commit to rootulp/celestia-core that referenced this issue Sep 20, 2024
…lestiaorg#1162)

Bumps [bufbuild/buf-setup-action](https://github.com/bufbuild/buf-setup-action) from 1.24.0 to 1.25.0.
- [Release notes](https://github.com/bufbuild/buf-setup-action/releases)
- [Commits](bufbuild/buf-setup-action@v1.24.0...v1.25.0)

---
updated-dependencies:
- dependency-name: bufbuild/buf-setup-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WS: Big Blonks 🔭 Improving consensus critical gossiping protocols
Projects
None yet
2 participants