Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Multinode P2P sync and POA ( Proof of Authority ) #1333

Closed
wants to merge 2 commits into from

Conversation

piux2
Copy link
Contributor

@piux2 piux2 commented Nov 4, 2023

Contributors' checklist...
  • Added new tests, or not needed, or not feasible
  • Provided an example (e.g. screenshot) to aid review or the PR is self-explanatory
  • Updated the official documentation or not needed
  • No breaking changes were made, or a BREAKING CHANGE: xxx message was included in the description
  • Added references to related issues and PRs
  • Provided any useful hints for running manual tests
  • Added new benchmarks to generated graphs, if any. More info here.

This is the solution for #1332

@piux2 piux2 requested a review from a team as a code owner November 4, 2023 03:57
@github-actions github-actions bot added the 📦 ⛰️ gno.land Issues or PRs gno.land package related label Nov 4, 2023
@piux2 piux2 changed the title Feat: Multinode P2P sync and POA ( Proof of Authority ) feat: Multinode P2P sync and POA ( Proof of Authority ) Nov 4, 2023
@albttx
Copy link
Member

albttx commented Nov 6, 2023

Hello, i wanted to do the same work on gnoland cli to easier interactions for setuping new chains / nodes on #1102

But please see discussion here #1105 and #1201 about the choices that have been made :)

Copy link
Member

@moul moul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need additional time to review.


Early concerns (to be confirmed during the real review):

  • The issue related to overriding os.Std*.
  • Exploring the possibility of moving most of the coordination logic to cmd/genesis to keep gnoland lightweight.
  • Considering an alternative coordination strategy that relies on full contract-based coordination. In this approach, the genesis can configure the initial set instead of maintaining a dual system where it is used for only a few (possibly only one) blocks.

@moul moul requested a review from albttx November 6, 2023 16:55
@moul
Copy link
Member

moul commented Nov 6, 2023

@zivkovicmilos can you check how much it overlaps with your work on genesis? if it can be made more compatible or if you think the location is not ideal?

@albttx can you give it a try and give us some feedback?

@moul moul mentioned this pull request Nov 7, 2023
7 tasks
if err != nil {
return fmt.Errorf("error in creating node: %w", err)
}

fmt.Fprintln(io.Err, "Node created.")
fmt.Println("Node created.")

if c.skipStart {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we introduce a new "init" subcommand, we need to remove the "--skip-start" option from the "start" subcommand. This option was used to execute an initialization phase in a command before running on an initialized node.

For additional information, please refer to this PR: #1050

@moul
Copy link
Member

moul commented Dec 7, 2023

I'm starting a full review now. To begin, I believe this PR is necessary for the launch as it helps manage the multinode p2p system. However, it is not directly related to PoA as it does not manage the validator set directly. Perhaps the title should be updated to reflect this.

io.SetOut(commands.WriteNopCloser(mockOut))
io.SetErr(commands.WriteNopCloser(mockErr))
cmd := newRootCmd(io)
in, err := NewMockStdin(name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that it is used more as an Stdout mock than an Stdin one. The name is likely incorrect. Nevertheless, I want to remove the mockio.go file from this PR. I need to find a link where I explained the reason, or I'll add a new comment here to provide the reasoning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I remember now.

The goal is to transition to a logger-only approach for tm2.

Could you please simplify this pull request by removing all the added mock features and keeping it as it was before? I will create an issue to ensure that we replace commands.IO with a logger.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the issue: #1419


$> gnoland start

Afterward, you can interact with [`gnokey`](../gnokey) or launch a [`gnoweb`](../gnoweb) interface.


## Option 2: Run a node and sync with a Proof of Authority (POA) network
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Option 2: Run a node and sync with a Proof of Authority (POA) network
## Option 2: Run a full node and sync with a network

This is a valid instruction for running a full node on a network, whether it is PoA or not.


- Retrive node id and give it trusted peers.

$> gnoland node
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$> gnoland node
$> gnoland init
$> gnoland node


- Download genesis.json from a trusted source and save it to the root-dir/config directory.

- Add your validator to the genesis file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to retain the POA title in this PR, which I find confusing, I suggest revamping the example. Instead of just adding yourself as a validator, the example should emphasize coordinating with others and running the genesis validator add multiple times.

The current explanation is not clear. A possible improvement is to instruct validators to add their node ID to the shared genesis file (e.g., on a GitHub repo) and ensure they start with persistent node IDs for connectivity with the updated genesis file.

However, it's important to include a prominent warning that this approach is not actually PoA or Po something. It's simply a genesis with peers and no proof of anything except "being in the genesis."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the gentx system for validators to be able to add themself inside the genesis.json
cf: #1102


## Option 2: Run a node and sync with a Proof of Authority (POA) network

- gnoland init
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this step is unnecessary in option 2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's needed to generate the $DAEMON_HOME (GNO_HOME) directory, with the config and data/

Comment on lines +77 to +93
## Reset `gnoland` node back to genesis state. It's suitable for running test node

$> gnoland unsafe-reset-all

It removes the database and reset validator state back to genesis state but leaves the genesis.json and config.toml files unchanged.

The `unsafe-reset-all` command is labeled "unsafe" because:

1. It irreversibly deletes all node data, risking data loss.
2. It may lead to double signing or chain forks in production.
3. It resets the `priv_validator_state.json`, and can cause network disruption if uncoordinated.

## Reset `gnoland` node history back to genesis state.

It removes the datastore and keeps the validator state unchanged. It reduces the risk of double signing and chain fork when we sync history state from the genesis. The validator will not sign a block until the node has synced, passing the state where the validator stopped signing.

$> gnoland reset-state
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albttx, what are your thoughts on these commands? Have you run them in production or on the testnet in the cosmos ecosystem? I want to understand which case these two commands handle, and I think maybe one command would suffice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moul I never used it before because it's new and i wasn't aware of it, but i might now use it often!

In tendermint, the more data it stored for a validator, the slower it is...
So we often restart from fresh install (prunned snapshot or state-sync) to have only latest block and be more efficient.
gnoland reset-state will be use in that case

commands.Metadata{
Name: "init",
ShortUsage: "init",
ShortHelp: "initialize gnoland node",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, I believe it doesn't initialize a Gnoland node (the start command does that), but it does initialize a validator keypair. Perhaps the help message should provide more accurate information, or we could transfer all the initialization logic from start to this location to make the help message more relevant.

privValStateFile := config.PrivValidatorStateFile()
var pv *privval.FilePV
if tmos.FileExists(privValKeyFile) {
logger.Info("Found private validator", "keyFile", privValKeyFile,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about panicking with the message "initialization already done" or something similar?

@albttx, we need your feedback on the initialization "UX" please.

"os"
)

func main() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it would be clearer to review the "main" and "newRootCmd" functions in the same file. I suggest switching back to "root.go" or renaming it to "main.go." However, it should remain as a single file, as it was previously.

@@ -0,0 +1,116 @@
package main
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #1419

func newValidatorCmd(bc baseCfg) *commands.Command {
cmd := commands.NewCommand(
commands.Metadata{
Name: "validator",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we merge the validator and node into a single command? This command will always display the node ID and optionally display validator information. We can also include additional information, such as the pruning strategy or any other relevant details that you would like to quickly check or share.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agreed, especially considering how a single node will almost always be tied to a single key pair (1 node = 1 validator)

func newNodeIDCmd(bc baseCfg) *commands.Command {
cmd := commands.NewCommand(
commands.Metadata{
Name: "node",
Copy link
Member

@moul moul Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the "validator" subcommand, I propose combining "node" and "validator" into one command, possibly called "info" or "node".


Edit: Upon further consideration, I believe it would be beneficial to implement a unique "info" command that provides the node ID, pubkey, and address.

Additionally, I propose using external helpers, potentially located in the contribs/ folder, to obtain more advanced information such as block height, connected peers, process status, and uptime. These helpers can primarily use the RPC info, read the config files or the raw datastore.

)

return cmd
}

// we relies on the flag option to pass in the root directory before we can identify where
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is probably better placed above line 40.

Comment on lines +42 to +47
newStartCmd(bc),
newInitCmd(bc),
newResetAllCmd(bc),
newResetStateCmd(bc),
newNodeIDCmd(bc),
newValidatorCmd(bc),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to understand the usages of the reset(s) commands, but I believe we should target these commands, and nothing more:

  • init
  • start
  • info

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @moul, this reset functionality is definitely something that's nice to have (and would've saved me headaches with the local portal loop in the beginning), but I don't think they belong as part of the official chain binary. Resetting chain state shouldn't be straightforward, because it's not an action that should ever happen; even if there is a need for it, initializing a new chain (working directory) instance is always a better way to go

)

// Display a node's persistent peer ID to the standard output.
func newNodeIDCmd(bc baseCfg) *commands.Command {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, the filename and function name should be consistent with "NodeID" and the subcommand should also be "node".

cmd := commands.NewCommand(
commands.Metadata{
ShortUsage: "<subcommand> [flags] [<arg>...]",
ShortHelp: "Starts the gnoland blockchain node",
Options: []ff.Option{
ff.WithConfigFileFlag("config"),
Copy link
Member

@moul moul Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what I prefer, but I think we should choose one of two extremes. Currently, we're in an intermediary step.

One extreme option is to clearly differentiate between config files and flags. For example, we can remove items like "persistentPeers" that are available both in the config file and as a start flag.

An alternative option is to establish a standard practice of using a dual approach (config file + flag). This could be implemented globally or exclusively in the init function.

To be honest, I'm not certain about a specific plan. I can only say that the current status is a hybrid mix of implicit and explicit approaches. It is inconsistent, with sometimes one way and sometimes two ways, depending on the commands.

Related with #731.


- Add your validator to the genesis file.

$> genesis validator add \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could be more generic, with a gentx system, it could be use to add genesis validator, to setup account money, setup realms and packages ...


- Start the node

$> gnoland start --prune "nothing"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flag should be name --pruning as in the config file


## Reset `gnoland` node back to genesis state. It's suitable for running test node

$> gnoland unsafe-reset-all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the flag --keep-addr-book

Question: In cosmos-sdk, this command is under gaiad tendermint unsafe-reset-all
Should we do the same ?

Usage:
  gaiad tendermint [command]

Available Commands:
  reset-state      Remove all the data and WAL
  show-address     Shows this node's tendermint validator consensus address
  show-node-id     Show this node's ID
  show-validator   Show this node's tendermint validator info
  unsafe-reset-all (unsafe) Remove all the data and WAL, reset this node's validator to genesis state
  version          Print tendermint libraries' version

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe gnoland tm2 ?

Copy link
Member

@zivkovicmilos zivkovicmilos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making the first step towards multinode networks 🙏

There are things in this PR that I don't necessarily agree with, and I'd like for us to open discussions on each, because they are critical for UX and general chain orchestration:

  • Pruning strategies shouldn't be set in this PR; they need more discussion
  • Currently, you cannot start the chain without both theconfig.toml, and genesis.json; The relationship between these two needs to be completely revised. This PR focuses on adding helper methods for the config, but doesn't touch upon the entanglement with genesis.json. The thing that I've found lacking was the ability to even define values for the config.toml, which directly affect new nodes (ex. mempool settings, peer settings...but also the genesis.json location, which is hardcoded)
  • This PR doesn't have anything to do with Proof of Authority per-se, it adds the ability to define the initial validator set (but not to change it once the chain is running, in a permissioned way, which PoA is all about). From discussions we've been having with @moul and the team, validator set states / changes should be done on-chain, while initial validator set info can / needs to be set in the genesis. This PR doesn't work in any mechanism to change the set, only to alter the initial node config for the p2p network

I think the entire chain initialization flow we currently have is very clunky, and we can do better. New chain orchestration needs to be straightforward and simple, so it can be packaged into more complex workflows by devs like @albttx.

The genesis.json is the single most important piece of information for any chain; people should be able to start new nodes and connect to a chain using only the genesis.json, their keys, and the node binary -- that's it. The protocol-level configuration for the node can live in a separate file (and path passed in through a flag), but it shouldn't be mandatory like it is for gno chains currently.

There were some other minor things about the PR, mostly nitpicks and concerning docs. There were flows that don't work (ex. genesis validator add takes in a bech32 representation of the public key, and the commands introduced in this PR print a different format).
I agree with everything @moul has put down as part of his detailed review.

What I propose we do:

  • change the scope of this PR to handle the config.toml, change title and description accordingly
  • open discussions for the points I've mentioned above

Comment on lines +152 to +158
// prune nothing is the archive node setting
// prune syncable is the default node setting. It keeps the lastest 100 transactions and everything 1000th tx
prune := store.NewPruningOptionsFromString(c.pruneStrategy)
pruningOpt := sdk.SetPruningOptions(prune)
gnoBaseApp := gnoApp.(*sdk.BaseApp)
pruningOpt(gnoBaseApp)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think pruning strategies should be a part of this PR, even if they utilize something that already exists in the SDK.

The reason being these are very large design decisions that have amplifications on end-users, to be included as an add-on. Pruning state is not the only change required to multi-node types, this split is more intricate:

  • what node types do we support?
  • how far back do they need to keep state?
  • what kind of state do they need to keep?

Not to mention that this directly affects how users now fetch data from RPC clusters, as now not all nodes can serve request (but need to delegate). Additionally, it also affects the way nodes execute live syncing (not every node has the entire state required to sync a peer). In my mind, these are things that require more planning and coordination, so I strongly suggest leaving out this pruning strategy change from the PR, and open an issue we can converge on

func newValidatorCmd(bc baseCfg) *commands.Command {
cmd := commands.NewCommand(
commands.Metadata{
Name: "validator",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agreed, especially considering how a single node will almost always be tied to a single key pair (1 node = 1 validator)

"github.com/gnolang/gno/tm2/pkg/p2p"
)

func TestResetAll(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can run tests in this file in parallel (t.Parallel)

Comment on lines +42 to +47
newStartCmd(bc),
newInitCmd(bc),
newResetAllCmd(bc),
newResetStateCmd(bc),
newNodeIDCmd(bc),
newValidatorCmd(bc),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @moul, this reset functionality is definitely something that's nice to have (and would've saved me headaches with the local portal loop in the beginning), but I don't think they belong as part of the official chain binary. Resetting chain state shouldn't be straightforward, because it's not an action that should ever happen; even if there is a need for it, initializing a new chain (working directory) instance is always a better way to go

Comment on lines +38 to +40
Address: "g14t47gv3v2z3pc23g3zr39mnc99w2cplp0jhqvv"
Pubkey: "E5IFULgXFdS49ILgvPmO3/8chuSWfbqw3zYXaNEP+60="

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This output is not the correct format I get when running gnoland node, ex:

NodeID: g1et0nr2z32lhp6qyfqxdlenxmlw9h3auqy7fw7c

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In cosmos-sdk, this is

gaiad tendermint show-node-id
gaiad tendermint show-address

I believe it should be 2 separate commands because most of the time, people use them with my action to $(gnoland tm2 show-address)

require.Equal(t, int64(10), pv.LastSignState.Height)
}

func initFilesWithConfig(config *cfg.Config) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add *testing.T as a param, and call t.Helper inside the method, so it's clear it's a test helper method (not sure how the linter didn't catch this)

func (bc *baseCfg) RegisterFlags(fs *flag.FlagSet) {
fs.StringVar(
&bc.rootDir,
"root-dir",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why, but the flag value for root-dir is not saved, ex:

$ gnoland init --root-dir node1

.level 1 .msg Generated private validator keyFile testdir/config/priv_validator_key.json stateFile testdir/data/priv_validator_state.json
.level 1 .msg Generated node key path testdir/config/node_key.json

bz := amino.MustMarshalJSON(pubKey)

fmt.Printf("Address: \"%v\"\n", pubKey.Address())
fmt.Printf(" Pubkey: %v\n", string(bz))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: there is an extra space between " and Pubkey


func execNodeID(bc baseCfg) error {
config := bc.tmConfig
nodeKey, err := p2p.LoadNodeKey(config.NodeKeyFile())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a more general question for everyone:
Why is the validator key != node key?

What is the point of keeping 2 separate keys? Why not just use the validator's private key as the node key, since they're both based on the 25519 curve?


$> genesis validator add \
--address g14t47gv3v2z3pc23g3zr39mnc99w2cplp0jhqvv \
--pub-key E5IFULgXFdS49ILgvPmO3/8chuSWfbqw3zYXaNEP+60= \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value for this flag is not good here, getting an error:
invalid validator public key, string not all lowercase or all uppercase

It's because genesis validator add expects a bech32 representation of the public key

Comment on lines +38 to +40
Address: "g14t47gv3v2z3pc23g3zr39mnc99w2cplp0jhqvv"
Pubkey: "E5IFULgXFdS49ILgvPmO3/8chuSWfbqw3zYXaNEP+60="

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In cosmos-sdk, this is

gaiad tendermint show-node-id
gaiad tendermint show-address

I believe it should be 2 separate commands because most of the time, people use them with my action to $(gnoland tm2 show-address)

@piux2
Copy link
Contributor Author

piux2 commented Oct 4, 2024

Outdated.

@piux2 piux2 closed this Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📦 ⛰️ gno.land Issues or PRs gno.land package related
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

4 participants