Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] unable to start Nerpa #5782

Closed
liviudm opened this issue Mar 11, 2021 · 8 comments
Closed

[BUG] unable to start Nerpa #5782

liviudm opened this issue Mar 11, 2021 · 8 comments
Labels
kind/bug Kind: Bug

Comments

@liviudm
Copy link
Contributor

liviudm commented Mar 11, 2021

Note: For security-related bugs/issues, please follow the security policy.

Describe the bug

When resetting Nerpa, only the first miner (preminer-0) finishes the init process succesfully. The other preminers are stuck.

Logs from preminer-1 (stuck during init):

{"level":"info","ts":"2021-03-11T12:01:49.064Z","logger":"main","caller":"lotus-storage-miner/init.go:510","msg":"Importing pre-sealed sector metadata for t01001"}
{"level":"info","ts":"2021-03-11T12:03:07.582Z","logger":"main","caller":"lotus-storage-miner/init.go:590","msg":"Waiting for message: bafy2bzaced6qob2ezp44rrvsbg4kov2npgbttvdumslp2ymvv3ra6pio4thmu"}

Previously, starting lotus-miner on preminer-0 was fixing the issue, however, right now preminer-0 mines new blocks succesfully, but the other ones are still stuck on the init process.

Logs from preminer-0 succesfully mining new blocks:

{"level":"info","ts":"2021-03-11T13:14:13.467Z","logger":"storageminer","caller":"storage/miner.go:263","msg":"GenerateWinningPoSt took 7.026684644s"}
{"level":"info","ts":"2021-03-11T13:14:13.487Z","logger":"miner","caller":"miner/miner.go:447","msg":"mined new block","cid":"bafy2bzacedrgj3rgfz3enbfieb5zfbyx7uelw4pqpmdgbkptyiwlahvfvok7m","height":"129","miner":"t01000","parents":["t01000"],"took":7.055728143}
zet37z55rwtjtaqphk","height":"128","miner":"t01000","parents":["t01000"],"took":7.094991983}
{"level":"info","ts":"2021-03-11T13:14:06.438Z","logger":"miner","caller":"miner/miner.go:383","msg":"Time delta between now and our mining base: 6s (nulls: 0)"}
{"level":"info","ts":"2021-03-11T13:14:06.440Z","logger":"storageminer","caller":"storage/miner.go:256","msg":"Computing WinningPoSt ;[{SealProof:2 SectorNumber:99 SealedCID:bagboea4b5abcbmmlzdqicdryqohknbguu6fv4zvbed4ftnn6mzvboyehryvy3ham}]; [5 57 44 191 124 205 206 148 190 2 106 1 22 253 215 38 148 179 10 154 84 66 92 57 65 23 24 110 164 127 137 221]"}
{"level":"info","ts":"2021-03-11T13:14:13.467Z","logger":"storageminer","caller":"storage/miner.go:263","msg":"GenerateWinningPoSt took 7.026684644s"}
{"level":"info","ts":"2021-03-11T13:14:13.487Z","logger":"miner","caller":"miner/miner.go:447","msg":"mined new block","cid":"bafy2bzacedrgj3rgfz3enbfieb5zfbyx7uelw4pqpmdgbkptyiwlahvfvok7m","height":"129","miner":"t01000","parents":["t01000"],"took":7.055728143}
{"level":"info","ts":"2021-03-11T13:14:36.108Z","logger":"miner","caller":"miner/miner.go:383","msg":"Time delta between now and our mining base: 6s (nulls: 0)"}
{"level":"info","ts":"2021-03-11T13:14:36.110Z","logger":"storageminer","caller":"storage/miner.go:256","msg":"Computing WinningPoSt ;[{SealProof:2 SectorNumber:595 SealedCID:bagboea4b5abcblanc4u63pvwtgu2yl27htzv6murwrtg7xsu44nl3w5x2gm6dqbc}]; [171 238 135 70 32 239 58 70 251 145 221 108 188 159 164 7 88 149 52 100 11 229 183 62 77 74 74 39 187 235 216 82]"}

Miner init systemd unit is the following:

[Unit]
Description=Lotus Miner Init
After=network.target
ConditionPathExists=!/var/lib/lotus-miner/datastore

[Service]
User=fc
Group=fc
ExecStart=/usr/local/bin/lotus-miner init --genesis-miner="false" --actor="t01002" --pre-sealed-metadata="/tmp/presealed-metadata.json" --nosync="true" --sector-size="512MiB" --pre-sealed-sectors=/interplanetary-network/pre-seal/fake/v27.1/512MiB/t01002/000 --pre-sealed-sectors=/interplanetary-network/pre-seal/fake/v27.1/512MiB/t01002/032 --pre-sealed-sectors=/interplanetary-network/pre-seal/fake/v27.1/512MiB/t01002/064 --pre-sealed-sectors=/interplanetary-network/pre-seal/fake/v27.1/512MiB/t01002/096 --pre-sealed-sectors=/interplanetary-network/pre-seal/fake/v27.1/512MiB/t01002/128 --pre-sealed-sectors=/interplanetary-network/pre-seal/fake/v27.1/512MiB/t01002/160 --pre-sealed-sectors=/interplanetary-network/pre-seal/fake/v27.1/512MiB/t01002/192 --pre-sealed-sectors=/interplanetary-network/pre-seal/fake/v27.1/512MiB/t01002/224 --pre-sealed-sectors=/interplanetary-network/pre-seal/fake/v27.1/512MiB/t01002/256 --pre-sealed-sectors=/interplanetary-network/pre-seal/fake/v27.1/512MiB/t01002/288 --pre-sealed-sectors=/interplanetary-network/pre-seal/fake/v27.1/512MiB/t01002/320 --pre-sealed-sectors=/interplanetary-network/pre-seal/fake/v27.1/512MiB/t01002/352 --pre-sealed-sectors=/interplanetary-network/pre-seal/fake/v27.1/512MiB/t01002/384 --pre-sealed-sectors=/interplanetary-network/pre-seal/fake/v27.1/512MiB/t01002/416 --pre-sealed-sectors=/interplanetary-network/pre-seal/fake/v27.1/512MiB/t01002/448 --pre-sealed-sectors=/interplanetary-network/pre-seal/fake/v27.1/512MiB/t01002/480 --pre-sealed-sectors=/interplanetary-network/pre-seal/fake/v27.1/512MiB/t01002/512 --pre-sealed-sectors=/interplanetary-network/pre-seal/fake/v27.1/512MiB/t01002/544 --pre-sealed-sectors=/interplanetary-network/pre-seal/fake/v27.1/512MiB/t01002/576 --pre-sealed-sectors=/interplanetary-network/pre-seal/fake/v27.1/512MiB/t01002/608 --pre-sealed-sectors=/interplanetary-network/pre-seal/fake/v27.1/512MiB/t01002/640 --pre-sealed-sectors=/interplanetary-network/pre-seal/fake/v27.1/512MiB/t01002/672 --pre-sealed-sectors=/interplanetary-network/pre-seal/fake/v27.1/512MiB/t01002/704 --pre-sealed-sectors=/interplanetary-network/pre-seal/fake/v27.1/512MiB/t01002/736 --pre-sealed-sectors=/interplanetary-network/pre-seal/fake/v27.1/512MiB/t01002/768 --pre-sealed-sectors=/interplanetary-network/pre-seal/fake/v27.1/512MiB/t01002/800 --pre-sealed-sectors=/interplanetary-network/pre-seal/fake/v27.1/512MiB/t01002/832 --pre-sealed-sectors=/interplanetary-network/pre-seal/fake/v27.1/512MiB/t01002/864 --pre-sealed-sectors=/interplanetary-network/pre-seal/fake/v27.1/512MiB/t01002/896 --pre-sealed-sectors=/interplanetary-network/pre-seal/fake/v27.1/512MiB/t01002/928 --pre-sealed-sectors=/interplanetary-network/pre-seal/fake/v27.1/512MiB/t01002/960 --pre-sealed-sectors=/interplanetary-network/pre-seal/fake/v27.1/512MiB/t01002/992
Environment=LOTUS_PATH="/var/lib/lotus"
Environment=LOTUS_MINER_PATH="/var/lib/lotus-miner"
Environment=GOLOG_FILE="/var/log/lotus-miner.log"
Environment=GOLOG_LOG_FMT="json"

[Install]
WantedBy=multi-user.target

Lotus binaries are compiled with GOFLAGS+=-tags=nerpanet from 3be4984d9.

Version (run lotus version): lotus version 1.5.2+git.3be4984d9

To Reproduce
Steps to reproduce the behavior:

  1. compile binaries with GOFLAGS+=-tags=nerpanet
  2. initialize a miner with fake sectors

Expected behavior
All pre-miners should finish the init process successfully in order to reset/bootstrap the Nerpa network.

Logs
Please see in description.

@ribasushi
Copy link
Collaborator

@travisperson any thoughts you could share would be much appreciated!

@travisperson
Copy link
Contributor

Are many of the other daemons in the network syncing the chain started by preminer-0?

This looks to be #5474

@liviudm
Copy link
Contributor Author

liviudm commented Mar 12, 2021

lotus sync status is empty, there are no workers.

I tried adding LOTUS_SYNC_BOOTSTRAP_PEERS=4 to all daemon and miner startup scripts, but with no luck.

I noticed something else instead on the bootstrap nodes:

{"level":"warn","ts":"2021-03-12T13:14:33.047Z","logger":"peermgr","caller":"peermgr/peermgr.go:217","msg":"failed to connect to bootstrap peer: failed to dial 12D3KooWBFxEigSKLvxJVdw3JziC9ePHHnyAn5LifWSqg2kttcth: all dials failed\n  * [/ip4/REDACTED/tcp/1347] failed to negotiate security protocol: peer id mismatch: expected 12D3KooWBFxEigSKLvxJVdw3JziC9ePHHnyAn5LifWSqg2kttcth, but remote key matches 12D3KooWBFBhh8LjquGnRzdpR9SvTQakjon31HbF6yHcsaMGpqvv"}
{"level":"warn","ts":"2021-03-12T13:14:33.047Z","logger":"peermgr","caller":"peermgr/peermgr.go:217","msg":"failed to connect to bootstrap peer: failed to dial 12D3KooWDfsxYk7dC6NNsHqZqqyMJCzkjZuXhjsmqBk3TUCBZLga: all dials failed\n  * [/ip4/REDACTED/tcp/1347] failed to negotiate security protocol: peer id mismatch: expected 12D3KooWDfsxYk7dC6NNsHqZqqyMJCzkjZuXhjsmqBk3TUCBZLga, but remote key matches 12D3KooWCh5mDxBFRxWwTF2VaE8SkDFHWzygdUHZ6AZ6pwDu4HWE"}
{"level":"info","ts":"2021-03-12T13:14:38.044Z","logger":"peermgr","caller":"peermgr/peermgr.go:210","msg":"connecting to bootstrap peers"}
{"level":"warn","ts":"2021-03-12T13:14:38.045Z","logger":"peermgr","caller":"peermgr/peermgr.go:217","msg":"failed to connect to bootstrap peer: dial backoff"}
{"level":"warn","ts":"2021-03-12T13:14:38.045Z","logger":"peermgr","caller":"peermgr/peermgr.go:217","msg":"failed to connect to bootstrap peer: dial backoff"}
{"level":"warn","ts":"2021-03-12T13:14:38.045Z","logger":"peermgr","caller":"peermgr/peermgr.go:217","msg":"failed to connect to bootstrap peer: dial backoff"}
{"level":"warn","ts":"2021-03-12T13:14:38.045Z","logger":"peermgr","caller":"peermgr/peermgr.go:217","msg":"failed to connect to bootstrap peer: dial backoff"}

I'm not sure where 12D3KooWDfsxYk7dC6NNsHqZqqyMJCzkjZuXhjsmqBk3TUCBZLga is coming from, here's the bootstrappers.pi file:

/dns4/bootstrap-3.nerpa.interplanetary.dev/tcp/1347/p2p/12D3KooWBFBhh8LjquGnRzdpR9SvTQakjon31HbF6yHcsaMGpqvv
/dns4/bootstrap-0.nerpa.interplanetary.dev/tcp/1347/p2p/12D3KooWAUMsazJoNf169aMYqcMJa4uQgjiMXHCvEXV8aFPmwyVs
/dns4/bootstrap-1.nerpa.interplanetary.dev/tcp/1347/p2p/12D3KooWCh5mDxBFRxWwTF2VaE8SkDFHWzygdUHZ6AZ6pwDu4HWE
/dns4/bootstrap-2.nerpa.interplanetary.dev/tcp/1347/p2p/12D3KooWNe5bg9nuEpipDcUULSn6s2HBZy2Br7iBJ7LhLzkAzbkm

@travisperson
Copy link
Contributor

Set LOTUS_SYNC_BOOTSTRAP_PEERS=1 on at least a number of peers equal to the value of BootstrapPeerThreshold.

@ribasushi
Copy link
Collaborator

@travisperson ohhhhh I did not understand the previous issue at all.
So the requirement is to take 4 ( the constant of BootstrapPeerThreshold ) peers, and on each of them set LOTUS_SYNC_BOOTSTRAP_PEERS=1

@travisperson
Copy link
Contributor

Yep, the issue is that the sync manager is waiting on enough peers to inform it of the same tipset, and because only a single miner is producing blocks, it can never reach this threshold. To get around this we need at least BootstrapPeerThreshold number of peers to accept the chain from a single peer.

@travisperson
Copy link
Contributor

These are some other issues you will probably run into:

#5446
#5701 (info #5692; resolved in #5730)

@rjan90
Copy link
Contributor

rjan90 commented Nov 24, 2021

The Nerpa-network has been depreciated, so this issue can probably be closed? #rengjøring

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Kind: Bug
Projects
None yet
Development

No branches or pull requests

5 participants