EPIC: Identifying Expected Network Traffic Rate in celestia-core #1080

staheri14 · 2023-09-13T20:20:58Z

Problem and Context

We have conducted a series of tests aimed at assessing the network's performance and overall health using various infrastructures. These tests have been performed in live networks like Mocha, as well as in test environments such as Testground and Knuu. Throughout these evaluations, we have monitored multiple performance indicators, including transaction throughput, block size, and block time. Additionally, we have also considered various health indicators, such as traffic rates per peer.

A recurring theme in all these scenarios is the need to answer critical questions: What are the expected outcomes? What constitutes healthy behavior? Depending on the answers to these questions, our test results become significantly more insightful, allowing us to pinpoint bottlenecks and disparities effectively.

In this EPIC, we provide a plan to address these questions for the network traffic rate.

Solution Plan

Setting a baseline for the expected traffic rate in a healthy network necessitates an understanding of the inner workings of the P2P layer within Celestia-Core (Tendermint) potentially by utilizing some low-level specification (which unfortunately is not available). However, our preliminary analysis has revealed that we are currently dealing with a limited number of reactors (channel IDs) and message types at the p2p layer. Thus, we can simplify this process by only focusing on extracting and documenting the flow of messages through their reactors and in relation to the number of connections and the state of the consensus. This would allow us to set an expected traffic rate per peer depending on the test scenario and network topology (and other factors that may come up as part of this EPIC)

Our approach involves an examination of the code associated with each message type, with a focus on understanding their life cycles. This entails:

When and under what conditions they are generated (e.g., height-based, round-based, or other criteria).
How they are propagated (e.g., in relation to connected peers, whether on request or through push mechanisms, or other contributing factors).
Determining the message size and factors contributing to its variability (e.g., block size).

Tasks

The following list categorizes message types based on their channel IDs (note that the message names are according to how they appear on the Prometheus metrics endpoint and may differ from their protobuf names). In this EPIC, we need to address questions outlined in the Solution Plan section for each of these message types and document the findings. Prioritization should be given to documenting message types that have contributed the most to overall traffic, as indicated by the results obtained in the previous issue. For reference, please find the screenshot of the results below:

0x30
- mempool_Txs -> doc: adds mempool v1 protocol spec #1108
0x21
- consensus_BlockPart -> doc: specifies DataChannel protocol in consensus reactor and relevant parts of StateChannel communication #1129
- consensus_Proposal -> doc: specifies DataChannel protocol in consensus reactor and relevant parts of StateChannel communication #1129
0x20
- consensus_NewRoundStep -> partially covered in doc: specifies DataChannel protocol in consensus reactor and relevant parts of StateChannel communication #1129

Out of scope

The epic does not include the following items (decided to exclude), as they account for a smaller portion of the overall traffic relative to other message types.

0x20
- consensus_HasVote
- consensus_NewValidBlock
- consensus_VoteSetMaj23
0x0
- p2p_PexAddrs
- p2p_PexRequest
0x22
- consensus_Vote
0x23
- consensus_VoteSetBits
0x40
- blockchain_BlockRequest
- blockchain_BlockResponse
- blockchain_StatusRequest
- blockchain_StatusResponse

staheri14 · 2023-09-14T18:40:27Z

Please consult the following table for mapping between channel ID and their names:

Channel ID	Channel Description
0x40	BlockchainChannel
0x30	MempoolChannel
0x23	VoteSetBitsChannel
0x22	VoteChannel
0x21	DataChannel
0x20	StateChannel
0x00	PexChannel

cmwaters · 2023-09-14T19:28:37Z

Nice work. This definitely helps to validate the impact to bandwidth that compact blocks would have (given the current significant bandwidth to block parts)

Part of #1080 This is the initial version of the mempool v1 protocol description, the content may undertake further updates (with new findings) in the follow up PRs.

… parts of StateChannel communication (#1129) Closes #1080 by capturing the second most bandwidth intensive communication protocol i.e., `DataChannel`. Initially, the State channel was not included in the scope of this PR. However, because the `DataChannel` protocol heavily relies on the exchange of peer state, which occurs over the `StateChannel`, the document has been expanded to address the pertinent aspects of the state channel. Please also note that this document will undergo additional updates in future rounds of editing as more details and insights are gathered about message flow, particularly concerning the updates transmitted over the `StateChannel`, which is intricately linked with the inner workings of the consensus state machine.

Part of #1080 This is the initial version of the mempool v1 protocol description, the content may undertake further updates (with new findings) in the follow up PRs.

staheri14 transferred this issue from celestiaorg/celestia-app Sep 13, 2023

staheri14 self-assigned this Sep 13, 2023

staheri14 added the Epic label Sep 13, 2023

This was referenced Sep 28, 2023

[Not ready for review] doc: adds mempool reactor doc #1105

Closed

doc: adds mempool v1 protocol spec #1108

Merged

staheri14 added a commit that referenced this issue Oct 10, 2023

doc: adds mempool v1 protocol spec (#1108)

2144692

Part of #1080 This is the initial version of the mempool v1 protocol description, the content may undertake further updates (with new findings) in the follow up PRs.

staheri14 mentioned this issue Oct 28, 2023

doc: specifies DataChannel protocol in consensus reactor and relevant parts of StateChannel communication #1129

Merged

staheri14 closed this as completed in #1129 Nov 8, 2023

cmwaters pushed a commit that referenced this issue Jul 30, 2024

doc: adds mempool v1 protocol spec (#1108)

ebf6750

Part of #1080 This is the initial version of the mempool v1 protocol description, the content may undertake further updates (with new findings) in the follow up PRs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EPIC: Identifying Expected Network Traffic Rate in celestia-core #1080

EPIC: Identifying Expected Network Traffic Rate in celestia-core #1080

staheri14 commented Sep 13, 2023 •

edited

Loading

staheri14 commented Sep 14, 2023 •

edited

Loading

cmwaters commented Sep 14, 2023

EPIC: Identifying Expected Network Traffic Rate in celestia-core #1080

EPIC: Identifying Expected Network Traffic Rate in celestia-core #1080

Comments

staheri14 commented Sep 13, 2023 • edited Loading

Problem and Context

Solution Plan

Tasks

Out of scope

staheri14 commented Sep 14, 2023 • edited Loading

cmwaters commented Sep 14, 2023

staheri14 commented Sep 13, 2023 •

edited

Loading

staheri14 commented Sep 14, 2023 •

edited

Loading