Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blobpool data - blocks start of geth after unclean shutdown. #28990

Closed
ChiefSgt opened this issue Feb 14, 2024 · 1 comment · Fixed by #29001
Closed

Blobpool data - blocks start of geth after unclean shutdown. #28990

ChiefSgt opened this issue Feb 14, 2024 · 1 comment · Fixed by #29001
Labels

Comments

@ChiefSgt
Copy link

System information

Geth version: 1.13.11 & 1.13.12
DB: Pebble, Scheme: Path
CL client & version: prysm@v4.2.1
OS & Version: 22.04 LTS Ubuntu server
Chain: Holesky
HW: i7-6700, 32GB RAM, SSDs

Desired behaviour

Geth service is able to start even if blobpool data has an issue.

Actual behaviour & workaround

After accidental power loss, Geth 1.13.11 or 1.13.12 is not able to start, exits with Fatal error.

  $ sudo journalctl -u geth.service

Feb 12 08:51:44 ethgoerli geth[765]: INFO [02-12|08:51:44.169] Loaded most recent local block           number=927,779 hash=a40c7b..6f3f27 td=1 age=1m20s
Feb 12 08:51:44 ethgoerli geth[765]: INFO [02-12|08:51:44.169] Loaded most recent local finalized block number=927,715 hash=f8c08f..095f98 td=1 age=14m8s
Feb 12 08:51:44 ethgoerli geth[765]: INFO [02-12|08:51:44.170] Loaded last snap-sync pivot marker       number=441,353
Feb 12 08:51:44 ethgoerli geth[765]: WARN [02-12|08:51:44.171] Head state missing, repairing            number=927,779 hash=a40c7b..6f3f27 snaproot=092afa..5d8676
Feb 12 08:51:45 ethgoerli geth[765]: INFO [02-12|08:51:45.311] Loaded most recent local header          number=927,779 hash=a40c7b..6f3f27 td=1 age=1m21s
Feb 12 08:51:45 ethgoerli geth[765]: INFO [02-12|08:51:45.311] Loaded most recent local block           number=925,620 hash=0836c0..a65e1a td=1 age=7h32m57s
Feb 12 08:51:45 ethgoerli geth[765]: INFO [02-12|08:51:45.311] Loaded most recent local snap block      number=927,779 hash=a40c7b..6f3f27 td=1 age=1m21s
Feb 12 08:51:45 ethgoerli geth[765]: INFO [02-12|08:51:45.311] Loaded most recent local finalized block number=927,715 hash=f8c08f..095f98 td=1 age=14m9s
Feb 12 08:51:45 ethgoerli geth[765]: INFO [02-12|08:51:45.311] Loaded last snap-sync pivot marker       number=441,353
Feb 12 08:51:45 ethgoerli geth[765]: WARN [02-12|08:51:45.316] Enabling snapshot recovery               chainhead=925,620 diskbase=926,703
Feb 12 08:51:45 ethgoerli geth[765]: WARN [02-12|08:51:45.318] Loaded snapshot journal                  diskroot=092afa..5d8676 diffs=unmatched
Feb 12 08:51:45 ethgoerli geth[765]: WARN [02-12|08:51:45.318] Snapshot is not continuous with chain    snaproot=092afa..5d8676 chainroot=46884f..af0630
Feb 12 08:51:45 ethgoerli geth[765]: INFO [02-12|08:51:45.318] Initialized transaction indexer          range="last 2350000 blocks"
Feb 12 08:51:45 ethgoerli geth[765]: INFO [02-12|08:51:45.320] Loaded local transaction journal         transactions=0 dropped=0
Feb 12 08:51:45 ethgoerli geth[765]: INFO [02-12|08:51:45.320] Regenerated local transaction journal    transactions=0 accounts=0
Feb 12 08:51:45 ethgoerli geth[765]: INFO [02-12|08:51:45.325] Transaction pool stopped
Feb 12 08:51:45 ethgoerli geth[765]: Fatal: Failed to register the Ethereum service: EOF
Feb 12 08:51:45 ethgoerli systemd[1]: geth.service: Main process exited, code=exited, status=1/FAILURE
Feb 12 08:51:45 ethgoerli systemd[1]: geth.service: Failed with result 'exit-code'.
Feb 12 08:51:45 ethgoerli systemd[1]: geth.service: Consumed 3.740s CPU time.
Feb 12 08:51:50 ethgoerli systemd[1]: geth.service: Scheduled restart job, restart counter is at 1.
Feb 12 08:51:50 ethgoerli systemd[1]: Stopped Go Ethereum Client.
Feb 12 08:51:50 ethgoerli systemd[1]: geth.service: Consumed 3.740s CPU time.

Updated Geth to 1.13.12, did not change behavior.
Tried to resync Geth 1.13.12 from start, still same Fatal error.

  $ sudo systemctl stop geth
  $ sudo -u goeth geth --datadir /mnt/ssd-eth/goethereum removedb
	Remove 'state data'? [y/n] y
	Remove 'ancient chain'? [y/n] y
  $ sudo systemctl start geth
  $ sudo journalctl -u geth.service

Feb 12 14:55:27 ethgoerli geth[1571]: INFO [02-12|14:55:27.260] Loaded most recent local block           number=0 hash=b5f7f9..61bde4 td=1 age=4mo2w3d
Feb 12 14:55:27 ethgoerli geth[1571]: WARN [02-12|14:55:27.261] Loaded snapshot journal                  diffs=missing
Feb 12 14:55:27 ethgoerli geth[1571]: INFO [02-12|14:55:27.261] Initialized transaction indexer          range="last 2350000 blocks"
Feb 12 14:55:27 ethgoerli geth[1571]: INFO [02-12|14:55:27.261] Resuming state snapshot generation       root=69d8c9..788783 accounts=0 slots=0 storage=0.00B dangling=0 elapsed="59.003µs"
Feb 12 14:55:27 ethgoerli geth[1571]: INFO [02-12|14:55:27.262] Loaded local transaction journal         transactions=0 dropped=0
Feb 12 14:55:27 ethgoerli geth[1571]: INFO [02-12|14:55:27.262] Regenerated local transaction journal    transactions=0 accounts=0
Feb 12 14:55:27 ethgoerli geth[1571]: INFO [02-12|14:55:27.263] Transaction pool stopped
Feb 12 14:55:27 ethgoerli geth[1571]: Fatal: Failed to register the Ethereum service: EOF
Feb 12 14:55:27 ethgoerli systemd[1]: geth.service: Main process exited, code=exited, status=1/FAILURE
Feb 12 14:55:27 ethgoerli systemd[1]: geth.service: Failed with result 'exit-code'.
Feb 12 14:55:30 ethgoerli systemd[1]: Stopped Go Ethereum Client.

Removed "LOCK" file and "blobpool" directory and everything looked OK again.

  $ sudo -u goeth geth --datadir /mnt/ssd-eth/goethereum removedb
	Remove 'state data'? [y/n] y
	Remove 'ancient chain'? [y/n] y
  $ sudo -u goeth rm -v /mnt/ssd-eth/goethereum/geth/LOCK
  $ sudo -u goeth rm -rv /mnt/ssd-eth/goethereum/geth/blobpool/
  $ sudo systemctl start geth
  $ sudo journalctl -u geth.service

Feb 12 14:56:22 ethgoerli geth[1596]: INFO [02-12|14:56:22.692] Loaded most recent local block           number=0 hash=b5f7f9..61bde4 td=1 age=4mo2w3d
Feb 12 14:56:22 ethgoerli geth[1596]: WARN [02-12|14:56:22.692] Loaded snapshot journal                  diffs=missing
Feb 12 14:56:22 ethgoerli geth[1596]: INFO [02-12|14:56:22.692] Initialized transaction indexer          range="last 2350000 blocks"
Feb 12 14:56:22 ethgoerli geth[1596]: INFO [02-12|14:56:22.692] Resuming state snapshot generation       root=69d8c9..788783 accounts=0 slots=0 storage=0.00B dangling=0 elapsed="50.172µs"
Feb 12 14:56:22 ethgoerli geth[1596]: INFO [02-12|14:56:22.692] Loaded local transaction journal         transactions=0 dropped=0
Feb 12 14:56:22 ethgoerli geth[1596]: INFO [02-12|14:56:22.692] Regenerated local transaction journal    transactions=0 accounts=0
Feb 12 14:56:22 ethgoerli geth[1596]: INFO [02-12|14:56:22.705] Generated state snapshot                 accounts=317 slots=31 storage=15.40KiB dangling=0 elapsed=12.675ms
Feb 12 14:56:22 ethgoerli geth[1596]: INFO [02-12|14:56:22.707] Enabled snap sync                        head=0 hash=b5f7f9..61bde4
Feb 12 14:56:22 ethgoerli geth[1596]: INFO [02-12|14:56:22.709] Chain post-merge, sync via beacon client
Feb 12 14:56:22 ethgoerli geth[1596]: INFO [02-12|14:56:22.709] Gasprice oracle is ignoring threshold set threshold=2
Feb 12 14:56:22 ethgoerli geth[1596]: WARN [02-12|14:56:22.721] Engine API enabled                       protocol=eth
Feb 12 14:56:22 ethgoerli geth[1596]: INFO [02-12|14:56:22.721] Starting peer-to-peer node               instance=Geth/v1.13.12-stable-02eb36af/linux-amd64/go1.21.6
Feb 12 14:56:22 ethgoerli geth[1596]: INFO [02-12|14:56:22.830] New local node record                    seq=****** id=****** ip=127.0.0.1 udp=30304 tcp=30304
Feb 12 14:56:22 ethgoerli geth[1596]: INFO [02-12|14:56:22.830] Started P2P networking                   self=enode://******
Feb 12 14:56:22 ethgoerli geth[1596]: INFO [02-12|14:56:22.831] IPC endpoint opened                      url=/mnt/ssd-eth/goethereum/geth.ipc
Feb 12 14:56:22 ethgoerli geth[1596]: INFO [02-12|14:56:22.831] Loaded JWT secret file                   path=/mnt/ssd-eth/******
Feb 12 14:56:22 ethgoerli geth[1596]: INFO [02-12|14:56:22.832] HTTP server started                      endpoint=127.0.0.1:8545 auth=false prefix= cors= vhosts=localhost
Feb 12 14:56:22 ethgoerli geth[1596]: INFO [02-12|14:56:22.832] WebSocket enabled                        url=ws://127.0.0.1:8551
Feb 12 14:56:22 ethgoerli geth[1596]: INFO [02-12|14:56:22.832] HTTP server started                      endpoint=127.0.0.1:8551 auth=true  prefix= cors=localhost vhosts=localhost
Feb 12 14:56:33 ethgoerli geth[1596]: INFO [02-12|14:56:33.589] Looking for peers                        peercount=1 tried=33 static=0

Suggestions for improvements.

  • Improve Fatal log info to easier pinpoint the issue causing the exit.
  • Automatically resync blobpool data at startup if it's detected as corrupt or has issues
  • Add removedb function for "blobpool data" removal
@ChiefSgt ChiefSgt changed the title Blobpool data blocking start of geth after unclean shutdown. Blobpool data - block start of geth after unclean shutdown. Feb 15, 2024
@ChiefSgt ChiefSgt changed the title Blobpool data - block start of geth after unclean shutdown. Blobpool data - blocks start of geth after unclean shutdown. Feb 15, 2024
@karalabe
Copy link
Member

Oh, this is very interesting. It should definitely not happen. I'd guess there's some issue in the database itself that the blob pool uses. Will look into it, thx.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants