Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avalanchego version 1.11.5 stopped working on the new kernel 5.15.0-111-generic #3022

Closed
MaxFoton opened this issue May 15, 2024 · 13 comments
Closed
Labels
bug Something isn't working
Milestone

Comments

@MaxFoton
Copy link

Describe the bug
error«:»subnets not bootstrapped

To Reproduce
Steps to reproduce the behavior.

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.
Снимок экрана от 2024-05-15 22-36-01

Logs
If applicable, please include the relevant logs that indicate a problem and/or the log directory of your node. By default, this can be found at ~/.avalanchego/logs/.

Metrics
If applicable, please include any metrics gathered from your node to assist us in diagnosing the problem.

Operating System
Ubuntu 20.04

Additional context
Add any other context about the problem here.

To best protect the Avalanche community security bugs should be reported in accordance to our Security Policy

@MaxFoton MaxFoton added the bug Something isn't working label May 15, 2024
@StephenButtolph
Copy link
Contributor

This log seems to indicate that the node is currently executing C-chain blocks during bootstrapping... This is expected... Did the node finish executing these blocks and eventually become healthy?

@haight6716
Copy link

haight6716 commented May 18, 2024

I am having the same issue. I let mine run overnight to see if it would finish, but no luck. My logs:


May 17 23:20:31 smeagol avalanchego[122456]: [05-17|23:20:31.123] INFO <C Chain> bootstrap/bootstrapper.go:600 fetching blocks {"numFetchedBlocks": 55010, "numTotalBlocks": 1558, "eta": "-3m17s"}
May 17 23:20:35 smeagol avalanchego[122456]: [05-17|23:20:35.481] INFO <C Chain> bootstrap/storage.go:163 executing blocks {"numToExecute": 55668}
May 17 23:20:35 smeagol avalanchego[122456]: [05-17|23:20:35.810] INFO  <C Chain> coreth/core/state/snapshot/generate.go:116 Wiper running, state snapshotting paused accounts=0 slots=0 storage=0.00B elapsed=3m35.652s
May 17 23:20:38 smeagol avalanchego[122456]: [05-17|23:20:38.258] INFO  <C Chain> coreth/core/state/snapshot/generate.go:116 Wiper running, state snapshotting paused accounts=0 slots=0 storage=0.00B elapsed=3m38.100s
May 17 23:20:40 smeagol avalanchego[122456]: [05-17|23:20:40.333] INFO  <C Chain> coreth/core/state/snapshot/generate.go:116 Wiper running, state snapshotting paused accounts=0 slots=0 storage=0.00B elapsed=3m40.175s
May 17 23:20:40 smeagol avalanchego[122456]: [05-17|23:20:40.489] INFO <C Chain> bootstrap/storage.go:210 executing blocks {"numExecuted": 50, "numToExecute": 55668, "eta": "1h32m51s"}
May 17 23:20:42 smeagol avalanchego[122456]: [05-17|23:20:42.815] INFO  <C Chain> coreth/core/state/snapshot/wipe.go:133 Deleted state snapshot leftovers         kind=storage  wiped=0 elapsed=3m22.885s
May 17 23:20:43 smeagol avalanchego[122456]: [05-17|23:20:43.600] INFO  <C Chain> coreth/core/state/snapshot/generate.go:116 Wiper running, state snapshotting paused accounts=0 slots=0 storage=0.00B elapsed=3m43.442s
May 17 23:51:36 smeagol avalanchego[122456]: [05-17|23:51:36.914] WARN health/health.go:99 failing check {"namespace": "readiness", "reason": {"bootstrapped":{"message":["11111111111111111111111111111111LpoYY"],"error":"subnets not bootstrapped","timestamp":"2024-05-17T23:51:20.98129403-07:00","duration":12820,"contiguousFailures":72,"timeOfFirstFailure":"2024-05-17T23:15:52.374871098-07:00"}}}
May 18 01:05:55 smeagol avalanchego[122456]: [05-18|01:05:55.596] WARN health/health.go:99 failing check {"namespace": "readiness", "reason": {"bootstrapped":{"message":["11111111111111111111111111111111LpoYY"],"error":"subnets not bootstrapped","timestamp":"2024-05-18T01:05:50.98148213-07:00","duration":20434,"contiguousFailures":221,"timeOfFirstFailure":"2024-05-17T23:15:52.374871098-07:00"}}}
May 18 11:22:23 smeagol avalanchego[122456]: [05-18|11:22:23.565] INFO node/node.go:1555 shutting down node {"exitCode": 0}

So that's 12 hours of trying before I shut it down.

./avalanchego-installer.sh --version v1.11.4

My node is back online (responsive) after downgrading, though I'm still regenerating the state snapshot. Happy to share more details if it would help.

@MaxFoton
Copy link
Author

This log seems to indicate that the node is currently executing C-chain blocks during bootstrapping... This is expected... Did the node finish executing these blocks and eventually become healthy?

no, the node was unable to connect to other nodes and begin to catch up with the altitude. I had to roll back to a previous version of the binary.

@gathulb
Copy link

gathulb commented May 19, 2024

I experienced the same issue running avalanchego v1.11.5 on ubuntu:

Health check reports:

"bootstrapped":{"message":["11111111111111111111111111111111LpoYY"],"error":"subnets not bootstrapped","timestamp":"2024-05-18T14:00:45.377574035-06:00","duration":2285,"contiguousFailures":4,"timeOfFirstFailure":"2024-05-18T13:59:15.377366043-06:00"}

Logs show:

May 18 14:01:06 fractal avalanchego[2157733]: [05-18|14:01:06.817] WARN health/health.go:110 failing check {"namespace": "health", "reason": {"C":{"message":{"engine":{"consensus":{},"vm":null},"networking":{"percentConnected":0.990311175288181}},"timestamp":"2024-05-18T14:00:45.377588241-06:00","duration":10259},"P":{"message":{"engine":{"consensus":{},"vm":null},"networking":{"percentConnected":0.990311175288181}},"timestamp":"2024-05-18T14:00:45.377578363-06:00","duration":11442},"X":{"message":{"engine":{"consensus":{},"vm":null},"networking":{"percentConnected":0.990311175288181}},"timestamp":"2024-05-18T14:00:45.377571269-06:00","duration":14627},"bootstrapped":{"message":["11111111111111111111111111111111LpoYY"],"error":"subnets not bootstrapped","timestamp":"2024-05-18T14:00:45.377574035-06:00","duration":2285,"contiguousFailures":4,"timeOfFirstFailure":"2024-05-18T13:59:15.377366043-06:00"},"database":{"timestamp":"2024-05-18T14:00:45.377565569-06:00","duration":5220},"diskspace":{"message":{"availableDiskBytes":300845068288},"timestamp":"2024-05-18T14:00:45.377571069-06:00","duration":3717},"network":{"message":{"connectedPeers":3631,"sendFailRate":0,"timeSinceLastMsgReceived":"377.552915ms","timeSinceLastMsgSent":"377.552915ms"},"timestamp":"2024-05-18T14:00:45.377556331-06:00","duration":8055},"router":{"message":{"longestRunningRequest":"1.901649146s","outstandingRequests":3},"timestamp":"2024-05-18T14:00:45.377565809-06:00","duration":7745}}}
May 18 14:01:22 fractal avalanchego[2157733]: [05-18|14:01:22.828] INFO <C Chain> bootstrap/bootstrapper.go:600 fetching blocks {"numFetchedBlocks": 15306, "numTotalBlocks": 8926, "eta": "-18s"}
May 18 14:01:37 fractal avalanchego[2157733]: [05-18|14:01:37.071] INFO <C Chain> bootstrap/bootstrapper.go:600 fetching blocks {"numFetchedBlocks": 20276, "numTotalBlocks": 8926, "eta": "-33s"}
May 18 14:01:49 fractal avalanchego[2157733]: [05-18|14:01:49.726] INFO <C Chain> bootstrap/bootstrapper.go:600 fetching blocks {"numFetchedBlocks": 25127, "numTotalBlocks": 8926, "eta": "-46s"}
May 18 14:02:04 fractal avalanchego[2157733]: [05-18|14:02:04.533] INFO <C Chain> bootstrap/bootstrapper.go:600 fetching blocks {"numFetchedBlocks": 30146, "numTotalBlocks": 8926, "eta": "-1m1s"}
May 18 14:02:20 fractal avalanchego[2157733]: [05-18|14:02:20.245] INFO <C Chain> bootstrap/bootstrapper.go:600 fetching blocks {"numFetchedBlocks": 35088, "numTotalBlocks": 8926, "eta": "-1m16s"}
May 18 14:02:20 fractal avalanchego[2157733]: [05-18|14:02:20.963] INFO  <C Chain> coreth/core/state/snapshot/wipe.go:133 Deleted state snapshot leftovers         kind=storage  wiped=2327 elapsed=1m48.023s
May 18 14:02:23 fractal avalanchego[2157733]: [05-18|14:02:23.776] INFO <C Chain> bootstrap/storage.go:163 executing blocks {"numToExecute": 37078}

Validator would be 'stuck' in the state shown above for 12 or more hours without a change. Restarts returned the system to a comparable state, and FWIW stopping the validator also took much longer (a few minutes) than usual.

Downgrading to v1.11.4 restored validator to a healthy working state.

@ceyonur
Copy link
Contributor

ceyonur commented May 20, 2024

Are you syncing to the network from scratch with v1.11.5, or did you upgrade your node from v1.11.4?
Could you provide your full logs? cc @gathulb @haight6716

@gathulb
Copy link

gathulb commented May 20, 2024

Are you syncing to the network from scratch with v1.11.5, or did you upgrade your node from v1.11.4? Could you provide your full logs? cc @gathulb @haight6716

I upgraded from v.1.11.4.
main.log

@ceyonur
Copy link
Contributor

ceyonur commented May 20, 2024

@gathulb thanks for providing those, may I ask you to also provide the C-chain logs (C.log)?

@gathulb
Copy link

gathulb commented May 20, 2024

@gathulb thanks for providing those, may I ask you to also provide the C-chain logs (C.log)?
C.log

@haight6716
Copy link

I also upgraded from 1.11.4. I can share logs too if it's helpful, but won't unless you say so. They're big because I ran low on disk space. Guessing that isn't related, but maybe a clue.

@ceyonur
Copy link
Contributor

ceyonur commented May 20, 2024

@haight6716 It would be really helpful if you can send yours. You can just send your C + main logs.

@StephenButtolph
Copy link
Contributor

PR to fix the negative ETA: #3036

@haight6716
Copy link

haight6716 commented May 22, 2024

[removed attachment]

I'm afraid the critical time may have rotated out of the C logs, but this should at least capture the main log.

@StephenButtolph StephenButtolph added this to the v1.11.6 milestone May 23, 2024
@StephenButtolph
Copy link
Contributor

StephenButtolph commented May 23, 2024

We believe this to have been fixed by ava-labs/coreth#559 which was included into v1.11.6. If anyone runs into this issue on v1.11.6 please open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

No branches or pull requests

5 participants