-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Multinode P2P sync and POA ( Proof of Authority ) #1333
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need additional time to review.
Early concerns (to be confirmed during the real review):
- The issue related to overriding
os.Std*
. - Exploring the possibility of moving most of the coordination logic to
cmd/genesis
to keepgnoland
lightweight. - Considering an alternative coordination strategy that relies on full contract-based coordination. In this approach, the genesis can configure the initial set instead of maintaining a dual system where it is used for only a few (possibly only one) blocks.
@zivkovicmilos can you check how much it overlaps with your work on genesis? if it can be made more compatible or if you think the location is not ideal? @albttx can you give it a try and give us some feedback? |
if err != nil { | ||
return fmt.Errorf("error in creating node: %w", err) | ||
} | ||
|
||
fmt.Fprintln(io.Err, "Node created.") | ||
fmt.Println("Node created.") | ||
|
||
if c.skipStart { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we introduce a new "init" subcommand, we need to remove the "--skip-start" option from the "start" subcommand. This option was used to execute an initialization phase in a command before running on an initialized node.
For additional information, please refer to this PR: #1050
I'm starting a full review now. To begin, I believe this PR is necessary for the launch as it helps manage the multinode p2p system. However, it is not directly related to PoA as it does not manage the validator set directly. Perhaps the title should be updated to reflect this. |
io.SetOut(commands.WriteNopCloser(mockOut)) | ||
io.SetErr(commands.WriteNopCloser(mockErr)) | ||
cmd := newRootCmd(io) | ||
in, err := NewMockStdin(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears that it is used more as an Stdout mock than an Stdin one. The name is likely incorrect. Nevertheless, I want to remove the mockio.go file from this PR. I need to find a link where I explained the reason, or I'll add a new comment here to provide the reasoning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I remember now.
The goal is to transition to a logger-only approach for tm2.
Could you please simplify this pull request by removing all the added mock features and keeping it as it was before? I will create an issue to ensure that we replace commands.IO with a logger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the issue: #1419
|
||
$> gnoland start | ||
|
||
Afterward, you can interact with [`gnokey`](../gnokey) or launch a [`gnoweb`](../gnoweb) interface. | ||
|
||
|
||
## Option 2: Run a node and sync with a Proof of Authority (POA) network |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Option 2: Run a node and sync with a Proof of Authority (POA) network | |
## Option 2: Run a full node and sync with a network |
This is a valid instruction for running a full node on a network, whether it is PoA or not.
|
||
- Retrive node id and give it trusted peers. | ||
|
||
$> gnoland node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$> gnoland node | |
$> gnoland init | |
$> gnoland node |
|
||
- Download genesis.json from a trusted source and save it to the root-dir/config directory. | ||
|
||
- Add your validator to the genesis file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to retain the POA title in this PR, which I find confusing, I suggest revamping the example. Instead of just adding yourself as a validator, the example should emphasize coordinating with others and running the genesis validator add multiple times.
The current explanation is not clear. A possible improvement is to instruct validators to add their node ID to the shared genesis file (e.g., on a GitHub repo) and ensure they start with persistent node IDs for connectivity with the updated genesis file.
However, it's important to include a prominent warning that this approach is not actually PoA or Po something. It's simply a genesis with peers and no proof of anything except "being in the genesis."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the gentx
system for validators to be able to add themself inside the genesis.json
cf: #1102
|
||
## Option 2: Run a node and sync with a Proof of Authority (POA) network | ||
|
||
- gnoland init |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this step is unnecessary in option 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's needed to generate the $DAEMON_HOME (GNO_HOME) directory, with the config and data/
## Reset `gnoland` node back to genesis state. It's suitable for running test node | ||
|
||
$> gnoland unsafe-reset-all | ||
|
||
It removes the database and reset validator state back to genesis state but leaves the genesis.json and config.toml files unchanged. | ||
|
||
The `unsafe-reset-all` command is labeled "unsafe" because: | ||
|
||
1. It irreversibly deletes all node data, risking data loss. | ||
2. It may lead to double signing or chain forks in production. | ||
3. It resets the `priv_validator_state.json`, and can cause network disruption if uncoordinated. | ||
|
||
## Reset `gnoland` node history back to genesis state. | ||
|
||
It removes the datastore and keeps the validator state unchanged. It reduces the risk of double signing and chain fork when we sync history state from the genesis. The validator will not sign a block until the node has synced, passing the state where the validator stopped signing. | ||
|
||
$> gnoland reset-state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@albttx, what are your thoughts on these commands? Have you run them in production or on the testnet in the cosmos ecosystem? I want to understand which case these two commands handle, and I think maybe one command would suffice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@moul I never used it before because it's new and i wasn't aware of it, but i might now use it often!
In tendermint, the more data it stored for a validator, the slower it is...
So we often restart from fresh install (prunned snapshot or state-sync) to have only latest block and be more efficient.
gnoland reset-state
will be use in that case
commands.Metadata{ | ||
Name: "init", | ||
ShortUsage: "init", | ||
ShortHelp: "initialize gnoland node", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, I believe it doesn't initialize a Gnoland node (the start command does that), but it does initialize a validator keypair. Perhaps the help message should provide more accurate information, or we could transfer all the initialization logic from start to this location to make the help message more relevant.
privValStateFile := config.PrivValidatorStateFile() | ||
var pv *privval.FilePV | ||
if tmos.FileExists(privValKeyFile) { | ||
logger.Info("Found private validator", "keyFile", privValKeyFile, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about panicking with the message "initialization already done" or something similar?
@albttx, we need your feedback on the initialization "UX" please.
"os" | ||
) | ||
|
||
func main() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it would be clearer to review the "main" and "newRootCmd" functions in the same file. I suggest switching back to "root.go" or renaming it to "main.go." However, it should remain as a single file, as it was previously.
@@ -0,0 +1,116 @@ | |||
package main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #1419
func newValidatorCmd(bc baseCfg) *commands.Command { | ||
cmd := commands.NewCommand( | ||
commands.Metadata{ | ||
Name: "validator", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we merge the validator and node into a single command? This command will always display the node ID and optionally display validator information. We can also include additional information, such as the pruning strategy or any other relevant details that you would like to quickly check or share.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agreed, especially considering how a single node will almost always be tied to a single key pair (1 node = 1 validator)
func newNodeIDCmd(bc baseCfg) *commands.Command { | ||
cmd := commands.NewCommand( | ||
commands.Metadata{ | ||
Name: "node", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in the "validator" subcommand, I propose combining "node" and "validator" into one command, possibly called "info" or "node".
Edit: Upon further consideration, I believe it would be beneficial to implement a unique "info" command that provides the node ID, pubkey, and address.
Additionally, I propose using external helpers, potentially located in the contribs/ folder, to obtain more advanced information such as block height, connected peers, process status, and uptime. These helpers can primarily use the RPC info, read the config files or the raw datastore.
) | ||
|
||
return cmd | ||
} | ||
|
||
// we relies on the flag option to pass in the root directory before we can identify where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is probably better placed above line 40.
newStartCmd(bc), | ||
newInitCmd(bc), | ||
newResetAllCmd(bc), | ||
newResetStateCmd(bc), | ||
newNodeIDCmd(bc), | ||
newValidatorCmd(bc), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still need to understand the usages of the reset(s) commands, but I believe we should target these commands, and nothing more:
- init
- start
- info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @moul, this reset functionality is definitely something that's nice to have (and would've saved me headaches with the local portal loop in the beginning), but I don't think they belong as part of the official chain binary. Resetting chain state shouldn't be straightforward, because it's not an action that should ever happen; even if there is a need for it, initializing a new chain (working directory) instance is always a better way to go
) | ||
|
||
// Display a node's persistent peer ID to the standard output. | ||
func newNodeIDCmd(bc baseCfg) *commands.Command { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, the filename and function name should be consistent with "NodeID" and the subcommand should also be "node".
cmd := commands.NewCommand( | ||
commands.Metadata{ | ||
ShortUsage: "<subcommand> [flags] [<arg>...]", | ||
ShortHelp: "Starts the gnoland blockchain node", | ||
Options: []ff.Option{ | ||
ff.WithConfigFileFlag("config"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what I prefer, but I think we should choose one of two extremes. Currently, we're in an intermediary step.
One extreme option is to clearly differentiate between config files and flags. For example, we can remove items like "persistentPeers" that are available both in the config file and as a start flag.
An alternative option is to establish a standard practice of using a dual approach (config file + flag). This could be implemented globally or exclusively in the init function.
To be honest, I'm not certain about a specific plan. I can only say that the current status is a hybrid mix of implicit and explicit approaches. It is inconsistent, with sometimes one way and sometimes two ways, depending on the commands.
Related with #731.
|
||
- Add your validator to the genesis file. | ||
|
||
$> genesis validator add \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could be more generic, with a gentx system, it could be use to add genesis validator, to setup account money, setup realms and packages ...
|
||
- Start the node | ||
|
||
$> gnoland start --prune "nothing" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flag should be name --pruning
as in the config file
|
||
## Reset `gnoland` node back to genesis state. It's suitable for running test node | ||
|
||
$> gnoland unsafe-reset-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the flag --keep-addr-book
Question: In cosmos-sdk, this command is under gaiad tendermint unsafe-reset-all
Should we do the same ?
Usage:
gaiad tendermint [command]
Available Commands:
reset-state Remove all the data and WAL
show-address Shows this node's tendermint validator consensus address
show-node-id Show this node's ID
show-validator Show this node's tendermint validator info
unsafe-reset-all (unsafe) Remove all the data and WAL, reset this node's validator to genesis state
version Print tendermint libraries' version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or maybe gnoland tm2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for making the first step towards multinode networks 🙏
There are things in this PR that I don't necessarily agree with, and I'd like for us to open discussions on each, because they are critical for UX and general chain orchestration:
- Pruning strategies shouldn't be set in this PR; they need more discussion
- Currently, you cannot start the chain without both the
config.toml
, andgenesis.json
; The relationship between these two needs to be completely revised. This PR focuses on adding helper methods for the config, but doesn't touch upon the entanglement withgenesis.json
. The thing that I've found lacking was the ability to even define values for theconfig.toml
, which directly affect new nodes (ex. mempool settings, peer settings...but also thegenesis.json
location, which is hardcoded) - This PR doesn't have anything to do with Proof of Authority per-se, it adds the ability to define the initial validator set (but not to change it once the chain is running, in a permissioned way, which PoA is all about). From discussions we've been having with @moul and the team, validator set states / changes should be done on-chain, while initial validator set info can / needs to be set in the genesis. This PR doesn't work in any mechanism to change the set, only to alter the initial node config for the p2p network
I think the entire chain initialization flow we currently have is very clunky, and we can do better. New chain orchestration needs to be straightforward and simple, so it can be packaged into more complex workflows by devs like @albttx.
The genesis.json
is the single most important piece of information for any chain; people should be able to start new nodes and connect to a chain using only the genesis.json
, their keys, and the node binary -- that's it. The protocol-level configuration for the node can live in a separate file (and path passed in through a flag), but it shouldn't be mandatory like it is for gno chains currently.
There were some other minor things about the PR, mostly nitpicks and concerning docs. There were flows that don't work (ex. genesis validator add
takes in a bech32 representation of the public key, and the commands introduced in this PR print a different format).
I agree with everything @moul has put down as part of his detailed review.
What I propose we do:
- change the scope of this PR to handle the
config.toml
, change title and description accordingly - open discussions for the points I've mentioned above
// prune nothing is the archive node setting | ||
// prune syncable is the default node setting. It keeps the lastest 100 transactions and everything 1000th tx | ||
prune := store.NewPruningOptionsFromString(c.pruneStrategy) | ||
pruningOpt := sdk.SetPruningOptions(prune) | ||
gnoBaseApp := gnoApp.(*sdk.BaseApp) | ||
pruningOpt(gnoBaseApp) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think pruning strategies should be a part of this PR, even if they utilize something that already exists in the SDK.
The reason being these are very large design decisions that have amplifications on end-users, to be included as an add-on. Pruning state is not the only change required to multi-node types, this split is more intricate:
- what node types do we support?
- how far back do they need to keep state?
- what kind of state do they need to keep?
Not to mention that this directly affects how users now fetch data from RPC clusters, as now not all nodes can serve request (but need to delegate). Additionally, it also affects the way nodes execute live syncing (not every node has the entire state required to sync a peer). In my mind, these are things that require more planning and coordination, so I strongly suggest leaving out this pruning strategy change from the PR, and open an issue we can converge on
func newValidatorCmd(bc baseCfg) *commands.Command { | ||
cmd := commands.NewCommand( | ||
commands.Metadata{ | ||
Name: "validator", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agreed, especially considering how a single node will almost always be tied to a single key pair (1 node = 1 validator)
"github.com/gnolang/gno/tm2/pkg/p2p" | ||
) | ||
|
||
func TestResetAll(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can run tests in this file in parallel (t.Parallel
)
newStartCmd(bc), | ||
newInitCmd(bc), | ||
newResetAllCmd(bc), | ||
newResetStateCmd(bc), | ||
newNodeIDCmd(bc), | ||
newValidatorCmd(bc), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @moul, this reset functionality is definitely something that's nice to have (and would've saved me headaches with the local portal loop in the beginning), but I don't think they belong as part of the official chain binary. Resetting chain state shouldn't be straightforward, because it's not an action that should ever happen; even if there is a need for it, initializing a new chain (working directory) instance is always a better way to go
Address: "g14t47gv3v2z3pc23g3zr39mnc99w2cplp0jhqvv" | ||
Pubkey: "E5IFULgXFdS49ILgvPmO3/8chuSWfbqw3zYXaNEP+60=" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This output is not the correct format I get when running gnoland node
, ex:
NodeID: g1et0nr2z32lhp6qyfqxdlenxmlw9h3auqy7fw7c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In cosmos-sdk, this is
gaiad tendermint show-node-id
gaiad tendermint show-address
I believe it should be 2 separate commands because most of the time, people use them with my action to $(gnoland tm2 show-address)
require.Equal(t, int64(10), pv.LastSignState.Height) | ||
} | ||
|
||
func initFilesWithConfig(config *cfg.Config) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can add *testing.T
as a param, and call t.Helper
inside the method, so it's clear it's a test helper method (not sure how the linter didn't catch this)
func (bc *baseCfg) RegisterFlags(fs *flag.FlagSet) { | ||
fs.StringVar( | ||
&bc.rootDir, | ||
"root-dir", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why, but the flag value for root-dir
is not saved, ex:
$ gnoland init --root-dir node1
.level 1 .msg Generated private validator keyFile testdir/config/priv_validator_key.json stateFile testdir/data/priv_validator_state.json
.level 1 .msg Generated node key path testdir/config/node_key.json
bz := amino.MustMarshalJSON(pubKey) | ||
|
||
fmt.Printf("Address: \"%v\"\n", pubKey.Address()) | ||
fmt.Printf(" Pubkey: %v\n", string(bz)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: there is an extra space between "
and Pubkey
|
||
func execNodeID(bc baseCfg) error { | ||
config := bc.tmConfig | ||
nodeKey, err := p2p.LoadNodeKey(config.NodeKeyFile()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a more general question for everyone:
Why is the validator key != node key?
What is the point of keeping 2 separate keys? Why not just use the validator's private key as the node key, since they're both based on the 25519 curve?
|
||
$> genesis validator add \ | ||
--address g14t47gv3v2z3pc23g3zr39mnc99w2cplp0jhqvv \ | ||
--pub-key E5IFULgXFdS49ILgvPmO3/8chuSWfbqw3zYXaNEP+60= \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value for this flag is not good here, getting an error:
invalid validator public key, string not all lowercase or all uppercase
It's because genesis validator add
expects a bech32 representation of the public key
Address: "g14t47gv3v2z3pc23g3zr39mnc99w2cplp0jhqvv" | ||
Pubkey: "E5IFULgXFdS49ILgvPmO3/8chuSWfbqw3zYXaNEP+60=" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In cosmos-sdk, this is
gaiad tendermint show-node-id
gaiad tendermint show-address
I believe it should be 2 separate commands because most of the time, people use them with my action to $(gnoland tm2 show-address)
Outdated. |
Contributors' checklist...
BREAKING CHANGE: xxx
message was included in the descriptionThis is the solution for #1332