Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block data synchronization and full node operation on test3 #863

Closed
dongwon8247 opened this issue Jun 1, 2023 · 5 comments
Closed

Block data synchronization and full node operation on test3 #863

dongwon8247 opened this issue Jun 1, 2023 · 5 comments
Assignees

Comments

@dongwon8247
Copy link
Member

dongwon8247 commented Jun 1, 2023

Description

I want to share Onbloc's experience of synching block data and running a full node on testnet3 for our infra tools such as Adena and Gnoscan. I hope our early experience contributes to this issue and improves the block sync & full node operation experience for other Gnoland infra teams in the future.

cc @moul @zivkovicmilos @albttx @r3v4s

How we sync the block data and run a full node on test3

  1. Run a script file requesting "https://rpc.test3.gno.land/block" every 15 seconds to determine if there is a recent block
    1-1. if no block has been generated, wait another 15 seconds and execute (1.)
    1-2. if there is a generated block, execute (2.)
  • The reason for doing it every 15 seconds is that we think it’s just the right amount of seconds to deal with irregular block generation time, because, currently on test3, a block is generated every minute if there’s no transactions, but if there’s is a transaction(s) in a block, it executes and the next block is generated right away (usually a second later). We could do it every 5 seconds or even lower to ensure real-time data but this is not a priority because we are on testnet and it could overload the RPC server, which isn’t necessary we thought.
  1. Use a slightly modified tm2txport binary to parse information about the latest block
    2-1. Parse block information
    2-2. Parse transaction information

  2. Bundle the block information & transaction information and store it in ES (Elastic Search)
    3-1. If there is a transaction, it adds a temporary hash value (block height_transaction index) and saves it
    3-2. If there’s no transaction, no hash value

  1. If there is a transaction in step (3.), it is processed and stored in MySQL
    4-1. Each type and function requires slightly different data for Gnoscan, so we are saving them individually.

Why we had to do it this way?

  • Fetching data by polling RPCs seems to be unavoidable at the moment (there’s no way of using a push method such as websocket subscribe or multi-node)
  • Using binary is unavoidable => You can get transaction information via HTTP RPC, but the marshal/unmarshal process results in data outside of the ascii range, making it nearly impossible to parse
  • We haven't considered WebSocket RPC yet because push vs polling is more important than the speed of a protocol

Main problems

  1. Since the block generation time is unpredictable, you need to periodically (15 seconds) request the "https://rpc.test3.gno.land/block" RPC to check whether the block is generated (Polling)
    • if the timing is not right, the transaction created now will be accumulated in the DB after 14 seconds, and this causes a major UX issue in Adena/Gnoscan
  2. The HTTP protocol speed is not that fast, so the sync can be pushed if blocks are generated every second when there are many transactions

Suggestions for solving problems

  1. Add a Push type of data communication method (i.g. websocket)
    • Push block/transaction events via websocket and store them in your own sync program
    • This will solve most of the problems, and be sufficient for now
  2. Multi-nodes
    • Modify the gnoland binary to add logic to intervene in the block generation process and save it to the DB
    • This will give infra teams more flexibility to process/modify block data on their own

Related with gnolang/hackerspace#9

@jaekwon jaekwon self-assigned this Jun 14, 2023
@moul
Copy link
Member

moul commented Jun 14, 2023

https://github.com/gnolang/gno/blob/408fc68d4b3c189dbc6a608c590a86c661ae232a/gno.land/cmd/gnoland/main.go#LL138C1-L138C1 -> CreateEmptyBlocks = true could help making block creation predictable.

@jaekwon
Copy link
Contributor

jaekwon commented Jun 14, 2023

We can set empty blocks to true. then blocks will come at regular intervals.
If empty blocks is set to false, then blocks will come at intervals between the blocktime and the empty-block-timeout, so between 6 seconds and 60 seconds or in between depending on when the next tx comes through.

The solution to poll vs push is to use websockets and to implement what is already in TM1 but not TM2, would be TM2/rpc/core/events.go, where Subscribe is implemented. Subscribe would not be available as an HTTP rest API, only as a websocket request. See also TM1/rpc/core/routes.go which tells the TM1 RPC system that Subscribe isn't available as an HTTP rest API (rpc.NewWSRPCFunc vs rpc.NewRPCFunc).

Basically we should port TM2/rpc/core/events.go but without using the query system. A good first step would be to just not have the query argument at all, and to subscribe to ALL TM events.

Then we can discuss what types of TM events you need to listen to, and we can just filter on those message types.
Hopefully we don't have to implement our own query-like system, but if we must, it is as simple and fast as possible. Please add me as reviewer for any related work here. If you only want to know when the next block comes through, that's an easy filter to implement -- filter only for EventNewBlockHeader events, see pkg/bft/types/events.go. EventNewBlockHeader should be lighter weight than EventNewBlock which includes the whole block info.

@r3v4s
Copy link
Contributor

r3v4s commented Jul 7, 2023

Hello @jaekwon @moul

Tested the current sync process with CreateEmptyBlocks = true, but it didn't help in predicting block creation.

Decreasing CreateEmptyBlocksInterval to 5s does create a block every 5 seconds, but I found 1 small issue.

If a new transaction occurs within 5 seconds, a new block is created immediately (which means it moves up BlockInterval time). Is this intended?

If it is intended, what is the purpose of doing this? Wouldn't it be better to create blocks regularly regardless of new transactions?

What do you think?

Testing

Testing Option 1.

CreateEmptyBlocks = false
CreateEmptyBlocksInterval = 5 * time.Second

> when there is a new tx, a new block(that contains the requested tx) gets created
> and right after that(maybe 1 ~ 2s) another new empty block gets created
> when there isn't any new tx, a new empty block gets created regularly every 5 seconds

Testing Option 2.

CreateEmptyBlocks = true
CreateEmptyBlocksInterval = 5 * time.Second

> when there is a new tx, a new block(that contains the requested tx) gets created
> and right after that(maybe 1 ~ 2s) another new empty block gets created
> when there isn't any new tx, a new empty block gets created regularly every 5 seconds
>> it seems to be `CreateEmptyBlocks` doesn't get affect when block interval is positive value

Testing Option 3.

CreateEmptyBlocks = false
CreateEmptyBlocksInterval = 0 * time.Second

> when there is a new tx, a new block(that contains the requested tx) gets created
> and right after that(maybe 1 ~ 2s) another new empty block gets created
> when there isn't any new tx, wait for the next tx (=> doesn't create any empty block)

Testing Option 4.

CreateEmptyBlocks = true
CreateEmptyBlocksInterval = 0 * time.Second

> regardless of a new tx, the block is created on a 1-second interval

@jaekwon
Copy link
Contributor

jaekwon commented Jul 12, 2023

Try increasing TimeoutCommit.

from tm2/pkg/bft/consensus/config/config.go:

// Commit returns the amount of time to wait for straggler votes after receiving +2/3 precommits for a single block (ie. a commit).

These comments could be duplicated above ConsensusConfig for better documentation.

BTW if TimeoutCommit is too low, then validators may appear as if they are offline if they are on the edge of the gossip network, or otherwise somehow slower to catch up or broadcast votes. The cosmos hub (gaia) uses the presence of validators in the Commit (which is +2/3 of precommit votes) to determine the liveness of validators. Not a problem for you if you want a large TimeoutCommit though.

Please reassign if this doesn't work or there are other questions.

@jaekwon jaekwon removed their assignment Jul 12, 2023
@r3v4s
Copy link
Contributor

r3v4s commented Jul 12, 2023

Try increasing TimeoutCommit.

from tm2/pkg/bft/consensus/config/config.go:

// Commit returns the amount of time to wait for straggler votes after receiving +2/3 precommits for a single block (ie. a commit).

These comments could be duplicated above ConsensusConfig for better documentation.

BTW if TimeoutCommit is too low, then validators may appear as if they are offline if they are on the edge of the gossip network, or otherwise somehow slower to catch up or broadcast votes. The cosmos hub (gaia) uses the presence of validators in the Commit (which is +2/3 of precommit votes) to determine the liveness of validators. Not a problem for you if you want a large TimeoutCommit though.

Please reassign if this doesn't work or there are other questions.

Big thanks for your comment! I (think) have resolved issue in #969. Please take a look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🚀 Needed for Launch
Status: Done
Development

No branches or pull requests

5 participants