Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV from eth/downloader/downloader.go #22714

Closed
pepa65 opened this issue Apr 21, 2021 · 34 comments · Fixed by #22728
Closed

SIGSEGV from eth/downloader/downloader.go #22714

pepa65 opened this issue Apr 21, 2021 · 34 comments · Fixed by #22728
Labels

Comments

@pepa65
Copy link

pepa65 commented Apr 21, 2021

Geth version: 1.10.2-stable, go1.16
OS & Version: Linux Mint 20 Ulyana
Commit hash : 97d11b0
Date: 20210408

It had been running for a number of days, but now it has crashed 3 times in the same day.
Run with: --datadir=/data/Geth --cache=512 --lightkdf --ws
Output:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x5455c4]

goroutine 54258 [running]:
math/big.(*Int).Cmp(0xc00bacb360, 0x0, 0x67f5d5406ab810c0)
        math/big/int.go:328 +0x44
github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).processHeaders(0xc00000c3c0, 0x98fcd9, 0xc00bacb360, 0x0, 0x0)
        github.com/ethereum/go-ethereum/eth/downloader/downloader.go:1586 +0x126f
github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).syncWithPeer.func6(0x140aba0, 0xc00842b260)
        github.com/ethereum/go-ethereum/eth/downloader/downloader.go:563 +0x3f
github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).spawnSync.func1(0xc00000c3c0, 0xc00d17b740, 0xc00cd6bc60)
        github.com/ethereum/go-ethereum/eth/downloader/downloader.go:584 +0x5b
created by github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).spawnSync
        github.com/ethereum/go-ethereum/eth/downloader/downloader.go:584 +0xaf
@ligi
Copy link
Member

ligi commented Apr 22, 2021

Can you provide a bit more log?

@pepa65
Copy link
Author

pepa65 commented Apr 22, 2021

Do you mean from before the crash? Because this is all there is after. OK, this it from the last run:

INFO [04-22|01:20:46.834] Imported new block receipts              count=63   elapsed=1.692s      number=10049429 hash="3d47db…9f2002" age=11mo2w13
h size=2.70MiB                                                                                                                                     
INFO [04-22|01:20:49.886] Imported new block receipts              count=10   elapsed=2.957s      number=10049439 hash="ff0c1d…2d8f40" age=11mo2w13h size=317.10KiB                                                         
INFO [04-22|01:20:50.552] Imported new state entries               count=536  elapsed=1.959ms     processed=16174976 pending=3409 trieretry=0   cod
eretry=0 duplicate=0 unexpected=0                                                                                                                  
WARN [04-22|01:20:50.552] Rewinding blockchain                     target=10071461                                                                 
ERROR[04-22|01:20:50.554] Ethereum peer removal failed             peer=165de667 err="peer not registered"                                         
INFO [04-22|01:20:51.168] Initializing state bloom                 items=6262162 errorrate=0.000 elapsed=22m22.209s                                
ERROR[04-22|01:20:51.575] Invalid block total difficulty RLP       hash="d4e567…cb8fa3" err="rlp: value size exceeds available input length"       
INFO [04-22|01:20:51.575] Loaded most recent local header          number=10071461 hash="ac386e…bab304" td=15436625923999431977999 age=11mo1w4d    
INFO [04-22|01:20:51.576] Loaded most recent local full block      number=0        hash="d4e567…cb8fa3" td=<nil>                   age=52y2w4d     
INFO [04-22|01:20:51.576] Loaded most recent local fast block      number=10049439 hash="ff0c1d…2d8f40" td=15386432373328820481508 age=11mo2w13h   INFO [04-22|01:20:51.576] Loaded last fast-sync pivot marker       number=12285044                                                                 
WARN [04-22|01:20:51.576] Rolled back chain segment                header=10078048->10071461 fast=10049439->10049439 block=0->0 reason=nil         WARN [04-22|01:20:51.576] Synchronisation failed, dropping peer    peer=165de667f354e430533b7cc3222c67e8a42306e5a33bb9a83ac07531aa7e23bc err="actio
n from bad peer ignored: header request timed out"                                                                                                 
ERROR[04-22|01:20:51.576] Ethereum peer removal failed             peer=165de667 err="peer not registered"                                         
WARN [04-22|01:20:57.724] Synchronisation failed, dropping peer    peer=2c02475b870b77ca394addba3e85981877d66122c40485ced607beb327d2310e err=timeou
t             
ERROR[04-22|01:20:57.724] Ethereum peer removal failed             peer=2c02475b err="peer not registered"                                         
INFO [04-22|01:20:59.173] Initializing state bloom                 items=6473403 errorrate=0.000 elapsed=22m30.214s                                
INFO [04-22|01:21:07.191] Initializing state bloom                 items=6502797 errorrate=0.000 elapsed=22m38.232s                                
INFO [04-22|01:21:15.191] Initializing state bloom                 items=6504447 errorrate=0.000 elapsed=22m46.232s                                
WARN [04-22|01:21:18.404] Stalling state sync, dropping peer       peer=2cd761db1e588e7c07797338562a35ef1a86b666721b7c8c191363d9d0c43c26           
WARN [04-22|01:21:18.404] Stalling state sync, dropping peer       peer=36e6eb1d979752ca01f41b733ffbf668f65a8d481c04d52b7bd85cf7c0e08529           
ERROR[04-22|01:21:18.404] Ethereum peer removal failed             peer=2cd761db err="peer not registered"                                         
INFO [04-22|01:21:18.404] Imported new state entries               count=1    elapsed="7.699µs"   processed=16174977 pending=17   trieretry=0   cod
eretry=0 duplicate=0 unexpected=0                                                                                                                  
ERROR[04-22|01:21:18.404] Ethereum peer removal failed             peer=36e6eb1d err="peer not registered"      
WARN [04-22|01:21:18.404] Synchronisation failed, dropping peer    peer=36e6eb1d979752ca01f41b733ffbf668f65a8d481c04d52b7bd85cf7c0e08529 err="retrieved hash chain is invalid: syncing canceled (requested)"
ERROR[04-22|01:21:18.405] Ethereum peer removal failed             peer=36e6eb1d err="peer not registered"
INFO [04-22|01:21:23.406] Initializing state bloom                 items=6506479 errorrate=0.000 elapsed=22m54.447s
WARN [04-22|01:21:23.748] Synchronisation failed, dropping peer    peer=2689d8adb6a1e8733d62aa88089bf30b843db565e08ec0ce31f78e4431b18929 err="action from bad peer ignored: returned headers 192 != requested 2"
ERROR[04-22|01:21:23.749] Ethereum peer removal failed             peer=2689d8ad err="peer not registered"
INFO [04-22|01:21:32.140] Initializing state bloom                 items=6508524 errorrate=0.000 elapsed=23m3.180s
INFO [04-22|01:21:40.173] Initializing state bloom                 items=6510768 errorrate=0.000 elapsed=23m11.214s
INFO [04-22|01:21:48.398] Initializing state bloom                 items=6512485 errorrate=0.000 elapsed=23m19.439s
INFO [04-22|01:21:56.271] Downloader queue stats                   receiptTasks=0     blockTasks=0     itemSize=87.72KiB  throttle=748
ERROR[04-22|01:21:56.271] Invalid block total difficulty RLP       hash="d4e567…cb8fa3" err="rlp: value size exceeds available input length"
ERROR[04-22|01:21:56.271] Ethereum peer removal failed             peer=9ac8df96 err="peer not registered"
INFO [04-22|01:21:56.437] Initializing state bloom                 items=6513984 errorrate=0.000 elapsed=23m27.478s
INFO [04-22|01:21:56.460] Imported new state entries               count=35   elapsed="6.076µs"   processed=16175012 pending=523  trieretry=0   coderetry=0 duplicate=0 unexpected=0
WARN [04-22|01:21:57.324] Synchronisation failed, dropping peer    peer=9ac8df96e49bce26338dadc6b0d7d03473441f2a349563763bd73a241c640a34 err="action from bad peer ignored: header request timed out"
ERROR[04-22|01:21:57.324] Ethereum peer removal failed             peer=9ac8df96 err="peer not registered"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x5455c4]

goroutine 30843 [running]:
math/big.(*Int).Cmp(0xc0323a1760, 0x0, 0x67f5d5406ab810c0)
        math/big/int.go:328 +0x44
github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).processHeaders(0xc00000c3c0, 0x9957a0, 0xc0323a1760, 0x0, 0x0)
        github.com/ethereum/go-ethereum/eth/downloader/downloader.go:1586 +0x126f
github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).syncWithPeer.func6(0x1, 0xc0214309c0)
        github.com/ethereum/go-ethereum/eth/downloader/downloader.go:563 +0x3f
github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).spawnSync.func1(0xc00000c3c0, 0xc039310ae0, 0xc03902b020)
        github.com/ethereum/go-ethereum/eth/downloader/downloader.go:584 +0x5b
created by github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).spawnSync
        github.com/ethereum/go-ethereum/eth/downloader/downloader.go:584 +0xaf

@ethereum ethereum deleted a comment from samlavery Apr 22, 2021
@holiman
Copy link
Contributor

holiman commented Apr 22, 2021

Interesting:

INFO [04-22|01:20:51.576] Loaded most recent local full block      number=0        hash="d4e567…cb8fa3" td=<nil>                   age=52y2w4d     

So we somehow have the genesis in there, but the genesis has nil for td?

@holiman
Copy link
Contributor

holiman commented Apr 22, 2021

This is what one gets starting a clean geth from scratch:

INFO [04-22|13:15:07.907] Loaded most recent local full block      number=0 hash=d4e567..cb8fa3 td=17,179,869,184 age=52y2w5d

So something, somwhere, deleted the td entry for genesis.
@pepa65 thanks for providing logs, that helps a lot!

@pepa65
Copy link
Author

pepa65 commented Apr 22, 2021

So what should I do? Ditch all 130GB and start again, or is there something else I can do?

@fjl
Copy link
Contributor

fjl commented Apr 22, 2021

@pepa65 it seems your node hit a bug in go-ethereum core, and we find it valuable for debugging. if you just want to get on with life now, you should resync the node. unfortunately the problem cannot be fixed immediately.

@pepa65
Copy link
Author

pepa65 commented Apr 22, 2021

I've restarted it one more time without --cache=512 --lightkdf and it's not immediately crashing...

@pepa65
Copy link
Author

pepa65 commented Apr 22, 2021

we find it valuable for debugging

Do you have enough information to look into things??

@pepa65
Copy link
Author

pepa65 commented Apr 22, 2021

Now running with: geth --datadir=/data/Geth --cache=512 --ws and still going strong. I suspect the problem is in the code relating to --lightkdf.

@holiman
Copy link
Contributor

holiman commented Apr 22, 2021

I've restarted it one more time without --cache=512 --lightkdf and it's not immediately crashing...

It would be interesting to see the startup portion of the logs, and see if it somehow got the td back for genesis or not. This line

INFO [04-22|01:20:51.576] Loaded most recent local full block 

@pepa65
Copy link
Author

pepa65 commented Apr 23, 2021

After hours of running, a crash still occurred with geth --datadir=/data/Geth --cache=512 --ws. (Will try again now without the --cache=512.)

These are all the lines from an earlier run:

INFO [04-21|23:06:07.209] Starting Geth on Ethereum mainnet...                                                                                     
INFO [04-21|23:06:07.214] Maximum peer count                       ETH=50 LES=0 total=50                           
INFO [04-21|23:06:07.214] Smartcard socket not found, disabling    err="stat /run/pcscd/pcscd.comm: no such file or directory"                    
INFO [04-21|23:06:07.441] Set global gas cap                       cap=25000000                                                                   
INFO [04-21|23:06:07.441] Allocated trie memory caches             clean=76.00MiB dirty=128.00MiB                                                  INFO [04-21|23:06:07.441] Allocated cache and file handles         database=/data/Geth/geth/chaindata cache=256.00MiB handles=524288
INFO [04-21|23:06:15.048] Opened ancient database                  database=/data/Geth/geth/chaindata/ancient readonly=false                       INFO [04-21|23:06:15.481] Initialised chain configuration          config="{ChainID: 1 Homestead: 1150000 DAO: 1920000 DAOSupport: true EIP150: 246
3000 EIP155: 2675000 EIP158: 2675000 Byzantium: 4370000 Constantinople: 7280000 Petersburg: 7280000 Istanbul: 9069000, Muir Glacier: 9200000, Berli
n: 12244000, YOLO v3: <nil>, Engine: ethash}"                                                                                                      INFO [04-21|23:06:16.415] Disk storage enabled for ethash caches   dir=/data/Geth/geth/ethash count=3
INFO [04-21|23:06:16.415] Disk storage enabled for ethash DAGs     dir=/home/pp/.ethash       count=2     
INFO [04-21|23:06:16.628] Initialising Ethereum protocol           network=1 dbversion=8
ERROR[04-21|23:06:16.642] Invalid block total difficulty RLP       hash="d4e567…cb8fa3" err="rlp: value size exceeds available input length"
INFO [04-21|23:06:16.643] Loaded most recent local header          number=10012776 hash="2df8cb…c48ade" td=15303219596947624774247 age=11mo2w6d    INFO [04-21|23:06:16.643] Loaded most recent local full block      number=0        hash="d4e567…cb8fa3" td=<nil>                   age=52y2w4d
INFO [04-21|23:06:16.643] Loaded most recent local fast block      number=9978039  hash="08ebf7…cab6a1" td=15222485840757087849431 age=11mo3w4d    INFO [04-21|23:06:16.643] Loaded last fast-sync pivot marker       number=12284412
INFO [04-21|23:06:17.240] Upgrading chain index                    type=bloombits percentage=99                                                    INFO [04-21|23:06:17.240] Loaded local transaction journal         transactions=0 dropped=0
INFO [04-21|23:06:17.240] Regenerated local transaction journal    transactions=0 accounts=0                                                
INFO [04-21|23:06:17.250] Allocated fast sync bloom                size=255.00MiB                                                             
WARN [04-21|23:06:17.251] Unclean shutdown detected                booted=2021-04-21T00:35:36+0700 age=22h30m41s                              
WARN [04-21|23:06:17.251] Unclean shutdown detected                booted=2021-04-21T01:42:45+0700 age=21h23m32s                              
INFO [04-21|23:06:17.251] Starting peer-to-peer node               instance=Geth/v1.10.2-stable-97d11b01/linux-amd64/go1.16
INFO [04-21|23:06:20.492] New local node record                    seq=9 id=381d29c9612850a6 ip=127.0.0.1 udp=30303 tcp=30303             
INFO [04-21|23:06:20.495] Started P2P networking                   self=enode://bc2307f60057a0eac90007145db3f70025070cfe2b461e806bdde986faf6d60e627d5023ec05ff565bd5c6ba25c4382bdb4471d21e3e35fd0b8db581c858110b@127.0.0.1:30303
INFO [04-21|23:06:20.738] IPC endpoint opened                      url=/data/Geth/geth.ipc                                                         INFO [04-21|23:06:20.739] WebSocket enabled                        url=ws://127.0.0.1:8546
INFO [04-21|23:06:24.872] New local node record                    seq=10 id=381d29c9612850a6 ip=49.48.39.177 udp=30303 tcp=30303
INFO [04-21|23:06:25.273] Initializing state bloom                 items=157121 errorrate=0.000 elapsed=8.020s
INFO [04-21|23:06:25.496] Upgrading chain index                    type=bloombits percentage=99
INFO [04-21|23:06:29.721] Finished upgrading chain index           type=bloombits
WARN [04-21|23:06:29.978] Dropping unsynced node during sync       id=c3ce42a8b5ccb91f conn=dyndial addr=141.164.49.72:30303 type=Geth/v1.10.3-unst
abl...
INFO [04-21|23:06:31.000] Looking for peers                        peercount=0 tried=45 static=0
INFO [04-21|23:06:33.381] Initializing state bloom                 items=321559 errorrate=0.000 elapsed=16.129s
INFO [04-21|23:06:39.099] Block synchronisation started 
INFO [04-21|23:06:41.204] Looking for peers                        peercount=0 tried=39 static=0
INFO [04-21|23:06:41.879] Initializing state bloom                 items=491305 errorrate=0.000 elapsed=24.627s
INFO [04-21|23:06:49.889] Initializing state bloom                 items=666109 errorrate=0.000 elapsed=32.637s
INFO [04-21|23:06:51.207] Looking for peers                        peercount=0 tried=36 static=0
INFO [04-21|23:06:57.991] Initializing state bloom                 items=870744 errorrate=0.000 elapsed=40.739s
INFO [04-21|23:07:01.446] Looking for peers                        peercount=0 tried=50 static=0
ERROR[04-21|23:07:02.648] Snapshot extension registration failed   peer=a05e766c err="peer connected on snap without compatible eth support"
INFO [04-21|23:07:05.994] Initializing state bloom                 items=1116371 errorrate=0.000 elapsed=48.742s
INFO [04-21|23:07:11.537] Looking for peers                        peercount=0 tried=49 static=0
INFO [04-21|23:07:14.448] Initializing state bloom                 items=1254054 errorrate=0.000 elapsed=57.196s
INFO [04-21|23:07:21.552] Looking for peers                        peercount=1 tried=37 static=0
INFO [04-21|23:07:22.467] Initializing state bloom                 items=1488430 errorrate=0.000 elapsed=1m5.215s
INFO [04-21|23:07:30.493] Initializing state bloom                 items=1715105 errorrate=0.000 elapsed=1m13.241s
INFO [04-21|23:07:31.870] Looking for peers                        peercount=0 tried=50 static=0
INFO [04-21|23:07:38.532] Initializing state bloom                 items=1906528 errorrate=0.000 elapsed=1m21.280s
WARN [04-21|23:07:39.100] Synchronisation failed, dropping peer    peer=f04b1566212d2dd301f9f25f23b5231f7eafc8644389daf58b432da7a91e3fc1 err=timeou
t
ERROR[04-21|23:07:39.100] Ethereum peer removal failed             peer=f04b1566 err="peer not registered"
INFO [04-21|23:07:41.871] Looking for peers                        peercount=2 tried=36 static=0
INFO [04-21|23:07:46.608] Initializing state bloom                 items=2080034 errorrate=0.000 elapsed=1m29.356s
INFO [04-21|23:07:54.611] Initializing state bloom                 items=2328727 errorrate=0.000 elapsed=1m37.359s
INFO [04-21|23:08:02.740] Initializing state bloom                 items=2523747 errorrate=0.000 elapsed=1m45.488s
INFO [04-21|23:08:10.809] Initializing state bloom                 items=2732750 errorrate=0.000 elapsed=1m53.557s
INFO [04-21|23:08:15.597] Imported new block headers               count=0 elapsed=31.446ms  number=9978615  hash="7483cc…343df8" age=11mo3w4d  ign
ored=576
INFO [04-21|23:08:15.615] Imported new block headers               count=0 elapsed=14.091ms  number=9978807  hash="c4608d…1cc8d1" age=11mo3w4d  ign
ored=192                                                                                                                                           
INFO [04-21|23:08:17.823] Downloader queue stats                   receiptTasks=723 blockTasks=756 itemSize=23.07KiB throttle=2841
INFO [04-21|23:08:18.932] Initializing state bloom                 items=2911859 errorrate=0.000 elapsed=2m1.680s
INFO [04-21|23:08:19.257] Imported new block receipts              count=2 elapsed=1.433s    number=9978041  hash="082e5b…59df74" age=11mo3w4d  siz
e=116.12KiB
INFO [04-21|23:08:26.972] Initializing state bloom                 items=3119738 errorrate=0.000 elapsed=2m9.720s
INFO [04-21|23:08:34.979] Initializing state bloom                 items=3332684 errorrate=0.000 elapsed=2m17.727s
INFO [04-21|23:08:43.031] Initializing state bloom                 items=3542104 errorrate=0.000 elapsed=2m25.779s
INFO [04-21|23:08:51.071] Initializing state bloom                 items=3688240 errorrate=0.000 elapsed=2m33.819s
INFO [04-21|23:08:59.410] Initializing state bloom                 items=3879560 errorrate=0.000 elapsed=2m42.158s
WARN [04-21|23:09:04.431] Dropping unsynced node during sync       id=c3263441961e79c0 conn=dyndial addr=176.9.122.147:30503 type=Geth/v1.9.25-unst
abl...
ERROR[04-21|23:09:05.004] Snap peer not registered                 id=302e88cf2fba4a6de2ec13bb056aa5c8b1ea4edb3aeaefdff4da036fcda0ce26
ERROR[04-21|23:09:05.005] Failed to unregister sync peer           peer=302e88cf err="peer is not registered"
ERROR[04-21|23:09:05.005] Ethereum peer removal failed             peer=302e88cf err="peer not registered"
ERROR[04-21|23:09:05.005] Ethereum peer removal failed             peer=302e88cf err="peer not registered"
INFO [04-21|23:09:05.468] Imported new block headers               count=0 elapsed=33.187ms  number=9979191  hash="c5ddc4…f3fce9" age=11mo3w4d  ign
ored=384
INFO [04-21|23:09:05.578] Imported new block headers               count=0 elapsed=106.561ms number=9981239  hash="24e0d8…74b42e" age=11mo3w4d  ign
ored=2048
INFO [04-21|23:09:05.600] Imported new block headers               count=0 elapsed=9.453ms   number=9981303  hash="37fff9…a23ec7" age=11mo3w4d  ign
ored=64
INFO [04-21|23:09:06.995] Imported new block receipts              count=4 elapsed=1.666s    number=9978045  hash="31e199…c2b5af" age=11mo3w4d  siz
e=223.71KiB
INFO [04-21|23:09:07.428] Initializing state bloom                 items=3943066 errorrate=0.000 elapsed=2m50.176s
INFO [04-21|23:09:15.521] Imported new block receipts              count=762 elapsed=8.507s    number=9978807  hash="c4608d…1cc8d1" age=11mo3w4d  s
ize=40.94MiB
INFO [04-21|23:09:15.565] Initializing state bloom                 items=3943157 errorrate=0.000 elapsed=2m58.313s
INFO [04-21|23:09:16.644] Imported new block headers               count=0   elapsed=74.530ms  number=9983351  hash="c24a80…a566b0" age=11mo3w3d  i
gnored=2048
INFO [04-21|23:09:16.735] Imported new block headers               count=0   elapsed=79.898ms  number=9985399  hash="c8b9df…d569b3" age=11mo3w3d  i
gnored=2048
INFO [04-21|23:09:16.832] Imported new block headers               count=0   elapsed=85.071ms  number=9987447  hash="a924ee…fd5c07" age=11mo3w3d  i
gnored=2048
INFO [04-21|23:09:16.944] Imported new block headers               count=0   elapsed=96.671ms  number=9989495  hash="20d03c…54b4f7" age=11mo3w2d  i
gnored=2048
INFO [04-21|23:09:19.919] Imported new block receipts              count=37  elapsed=4.397s    number=9978844  hash="36498b…58f627" age=11mo3w4d  s
ize=1.76MiB
INFO [04-21|23:09:19.920] Downloader queue stats                   receiptTasks=9793 blockTasks=10113 itemSize=142.51KiB throttle=460
INFO [04-21|23:09:22.980] Imported new block headers               count=0   elapsed=6.026s    number=9991543  hash="131f94…5214d8" age=11mo3w2d  i
gnored=2048
INFO [04-21|23:09:23.088] Imported new block headers               count=0   elapsed=97.072ms  number=9993591  hash="842aee…37d97d" age=11mo3w2d  i
gnored=2048
INFO [04-21|23:09:23.153] Imported new block receipts              count=33  elapsed=3.232s    number=9978877  hash="82e42a…b4b3d1" age=11mo3w4d  s
ize=1.80MiB
INFO [04-21|23:09:23.806] Initializing state bloom                 items=3943203 errorrate=0.000 elapsed=3m6.553s
INFO [04-21|23:09:25.452] Imported new block receipts              count=38  elapsed=2.206s    number=9978915  hash="8a8984…279f0a" age=11mo3w4d  s
ize=1.76MiB
INFO [04-21|23:09:29.367] Imported new block receipts              count=32  elapsed=3.913s    number=9978947  hash="093ba2…e07529" age=11mo3w4d  s
ize=1.79MiB
INFO [04-21|23:09:31.569] Imported new block receipts              count=35  elapsed=2.200s    number=9978982  hash="759604…052b99" age=11mo3w4d  s
ize=1.84MiB
INFO [04-21|23:09:32.065] Initializing state bloom                 items=3943223 errorrate=0.000 elapsed=3m14.813s
INFO [04-21|23:09:33.607] Imported new state entries               count=309 elapsed="896.627µs" processed=16086455 pending=896 trieretry=0 coderet
ry=0 duplicate=0 unexpected=0
INFO [04-21|23:09:37.729] Imported new block receipts              count=339 elapsed=6.150s      number=9979321  hash="689124…0d57a3" age=11mo3w4d 
 size=17.84MiB
INFO [04-21|23:09:40.648] Initializing state bloom                 items=3943383 errorrate=0.000 elapsed=3m23.396s
INFO [04-21|23:09:40.991] Imported new block receipts              count=33  elapsed=3.260s      number=9979354  hash="66aa47…a6d961" age=11mo3w4d 
 size=1.68MiB
WARN [04-21|23:09:42.464] Checkpoint challenge timed out, dropping id=ab2b5f969b34c41c conn=dyndial addr=62.182.85.30:30303  type=Geth/v1.10.2-stab
le-...
ERROR[04-21|23:09:42.465] Ethereum peer removal failed             peer=ab2b5f96 err="peer not registered"
INFO [04-21|23:09:46.604] Imported new block headers               count=0   elapsed=113.740ms   number=9995639  hash="4a336b…282f69" age=11mo3w1d 
 ignored=2048
INFO [04-21|23:09:46.701] Imported new block headers               count=0   elapsed=82.279ms    number=9997623  hash="56a7cf…2afe84" age=11mo3w1d 
 ignored=1984
INFO [04-21|23:09:46.749] Imported new block headers               count=0   elapsed=33.353ms    number=9998199  hash="beae62…5469df" age=11mo3w1d 
 ignored=576
WARN [04-21|23:09:46.856] Dropping unsynced node during sync       id=bf799214872f28e7 conn=dyndial addr=139.5.203.2:30303   type=Geth/v1.9.9-stabl
e-0...
INFO [04-21|23:09:46.902] Imported new block headers               count=0   elapsed=84.151ms    number=9999735  hash="739d6c…6683f2" age=11mo3w1d 
 ignored=1536
INFO [04-21|23:09:48.805] Imported new block headers               count=0   elapsed=56.848ms    number=10000695 hash="3a8d28…69ed3a" age=11mo3w1d 
 ignored=960
INFO [04-21|23:09:48.832] Initializing state bloom                 items=3943400 errorrate=0.000 elapsed=3m31.580s
INFO [04-21|23:09:50.860] Imported new block receipts              count=32  elapsed=4.420s      number=9979386  hash="b6c2bd…babf9c" age=11mo3w4d 
 size=1.82MiB
INFO [04-21|23:09:57.183] Initializing state bloom                 items=3943410 errorrate=0.000 elapsed=3m39.931s
INFO [04-21|23:09:57.448] Imported new block receipts              count=345 elapsed=6.578s      number=9979731  hash="ab65fa…a37843" age=11mo3w4d 
 size=18.51MiB
INFO [04-21|23:10:03.270] Imported new block receipts              count=197 elapsed=5.814s      number=9979928  hash="7f33c5…598797" age=11mo3w4d 
 size=10.81MiB
INFO [04-21|23:10:05.254] Initializing state bloom                 items=3943427 errorrate=0.000 elapsed=3m48.002s
INFO [04-21|23:10:08.210] Imported new block receipts              count=30  elapsed=2.252s      number=9979958  hash="91f2a2…722ba1" age=11mo3w4d 
 size=1.83MiB
INFO [04-21|23:10:12.316] Imported new block receipts              count=333 elapsed=4.094s      number=9980291  hash="f25197…ff1c6e" age=11mo3w4d 
 size=19.88MiB
INFO [04-21|23:10:13.422] Initializing state bloom                 items=3943445 errorrate=0.000 elapsed=3m56.170s
INFO [04-21|23:10:15.387] Imported new block receipts              count=166 elapsed=3.065s      number=9980457  hash="b254af…23f635" age=11mo3w4d 
 size=9.64MiB
INFO [04-21|23:10:18.379] Imported new block receipts              count=31  elapsed=2.991s      number=9980488  hash="f55398…3bed5e" age=11mo3w4d 
 size=1.76MiB
INFO [04-21|23:10:21.700] Initializing state bloom                 items=3943462 errorrate=0.000 elapsed=4m4.448s
INFO [04-21|23:10:22.272] Imported new block headers               count=0   elapsed=81.027ms    number=10002615 hash="1ce16f…032fec" age=11mo3w17h
 ignored=1920
INFO [04-21|23:10:25.557] Imported new block receipts              count=159 elapsed=7.173s      number=9980647  hash="a64091…fb109b" age=11mo3w4d 
 size=8.34MiB
INFO [04-21|23:10:25.562] Downloader queue stats                   receiptTasks=20538 blockTasks=21534 itemSize=153.23KiB throttle=428
INFO [04-21|23:10:28.830] Imported new block receipts              count=247 elapsed=3.266s      number=9980894  hash="defd04…b62235" age=11mo3w4d 
 size=13.70MiB
INFO [04-21|23:10:29.781] Initializing state bloom                 items=3943479 errorrate=0.000 elapsed=4m12.529s
INFO [04-21|23:10:31.430] Imported new block receipts              count=113 elapsed=2.597s      number=9981007  hash="25be9c…8eaa52" age=11mo3w4d 
 size=5.79MiB
INFO [04-21|23:10:34.470] Imported new block receipts              count=67  elapsed=3.037s      number=9981074  hash="d30181…083d11" age=11mo3w4d 
 size=3.56MiB
INFO [04-21|23:10:37.834] Initializing state bloom                 items=3943500 errorrate=0.000 elapsed=4m20.582s
ERROR[04-21|23:10:44.204] Ethereum peer removal failed             peer=c01e5f12 err="peer not registered"
ERROR[04-21|23:10:45.267] Ethereum peer removal failed             peer=fa858a4f err="peer not registered"
WARN [04-21|23:10:45.267] Rewinding blockchain                     target=10000566
INFO [04-21|23:10:45.943] Initializing state bloom                 items=3943532 errorrate=0.000 elapsed=4m28.691s
ERROR[04-21|23:10:51.519] Invalid block total difficulty RLP       hash="d4e567…cb8fa3" err="rlp: value size exceeds available input length"
INFO [04-21|23:10:51.520] Loaded most recent local header          number=10000566 hash="f6a656…9946d6" td=15274645360393318452534 age=11mo3w1d
INFO [04-21|23:10:51.520] Loaded most recent local full block      number=0        hash="d4e567…cb8fa3" td=<nil>                   age=52y2w4d
INFO [04-21|23:10:51.520] Loaded most recent local fast block      number=9981074  hash="d30181…083d11" td=15229501211164615569028 age=11mo3w4d
INFO [04-21|23:10:51.520] Loaded last fast-sync pivot marker       number=12284456
WARN [04-21|23:10:51.520] Rolled back chain segment                header=10012776->10000566 fast=9981074->9981074 block=0->0 reason=nil
INFO [04-21|23:10:54.045] Initializing state bloom                 items=3943551 errorrate=0.000 elapsed=4m36.793s
INFO [04-21|23:11:02.067] Initializing state bloom                 items=3943565 errorrate=0.000 elapsed=4m44.815s
INFO [04-21|23:11:10.592] Initializing state bloom                 items=3943594 errorrate=0.000 elapsed=4m53.340s
INFO [04-21|23:11:14.596] Imported new state entries               count=113 elapsed="632.158µs" processed=16086568 pending=1127 trieretry=0 codere
try=0 duplicate=0 unexpected=0
WARN [04-21|23:11:14.596] Synchronisation failed, dropping peer    peer=fa858a4f4d1635d47fe61dff84229f8f67ac504673003e4280fb0c0a8cb03228 err="actio
n from bad peer ignored: header request timed out"
ERROR[04-21|23:11:14.596] Ethereum peer removal failed             peer=fa858a4f err="peer not registered"
INFO [04-21|23:11:18.622] Initializing state bloom                 items=3943649 errorrate=0.000 elapsed=5m1.370s
INFO [04-21|23:11:26.636] Initializing state bloom                 items=3943676 errorrate=0.000 elapsed=5m9.383s
INFO [04-21|23:11:34.636] Initializing state bloom                 items=3943720 errorrate=0.000 elapsed=5m17.384s
INFO [04-21|23:11:42.641] Initializing state bloom                 items=3943759 errorrate=0.000 elapsed=5m25.389s
INFO [04-21|23:11:50.928] Initializing state bloom                 items=3943786 errorrate=0.000 elapsed=5m33.676s
INFO [04-21|23:11:59.137] Initializing state bloom                 items=3943816 errorrate=0.000 elapsed=5m41.885s
INFO [04-21|23:12:07.955] Initializing state bloom                 items=3943842 errorrate=0.000 elapsed=5m50.703s
INFO [04-21|23:12:16.126] Initializing state bloom                 items=3943866 errorrate=0.000 elapsed=5m58.874s
INFO [04-21|23:12:16.571] Imported new block headers               count=0   elapsed=36.198ms    number=9981842  hash="7cf0ce…c42685" age=11mo3w3d 
 ignored=768
ERROR[04-21|23:12:19.268] Ethereum peer removal failed             peer=7ea9bb63 err="peer not registered"
WARN [04-21|23:12:19.976] Stalling state sync, dropping peer       peer=14d32bd9cbfba2fc07654189da944d0a64cf346d747be5a3d551804cd653ed0d
ERROR[04-21|23:12:19.976] Ethereum peer removal failed             peer=14d32bd9 err="peer not registered"
INFO [04-21|23:12:20.126] Imported new block headers               count=0   elapsed=101.731ms   number=9983890  hash="6f9fb8…9b2aac" age=11mo3w3d 
 ignored=2048
INFO [04-21|23:12:20.225] Imported new block headers               count=0   elapsed=87.611ms    number=9985938  hash="958657…11d1eb" age=11mo3w3d 
 ignored=2048
INFO [04-21|23:12:20.262] Imported new block headers               count=0   elapsed=24.418ms    number=9986450  hash="1d4b1a…9ac412" age=11mo3w3d 
 ignored=512
INFO [04-21|23:12:23.906] Imported new block headers               count=0   elapsed=92.052ms    number=9988498  hash="779a6f…3104b6" age=11mo3w2d 
 ignored=2048
INFO [04-21|23:12:23.941] Imported new block headers               count=0   elapsed=21.093ms    number=9988946  hash="f30ebe…5ab651" age=11mo3w2d 
 ignored=448
INFO [04-21|23:12:24.260] Initializing state bloom                 items=3943893 errorrate=0.000 elapsed=6m7.008s
ERROR[04-21|23:12:24.848] Ethereum peer removal failed             peer=ab2b5f96 err="peer not registered"
INFO [04-21|23:12:24.849] Imported new state entries               count=124 elapsed="497.279µs" processed=16086692 pending=1111 trieretry=0 codere
try=0 duplicate=0 unexpected=0
INFO [04-21|23:12:24.848] Downloader queue stats                   receiptTasks=7450  blockTasks=7780  itemSize=132.90KiB throttle=494
WARN [04-21|23:12:24.848] Rewinding blockchain                     target=9986897
ERROR[04-21|23:12:24.848] Ethereum peer removal failed             peer=ab2b5f96 err="peer not registered"
INFO [04-21|23:12:32.516] Initializing state bloom                 items=3943960 errorrate=0.000 elapsed=6m15.264s
ERROR[04-21|23:12:38.463] Invalid block total difficulty RLP       hash="d4e567…cb8fa3" err="rlp: value size exceeds available input length"
INFO [04-21|23:12:38.463] Loaded most recent local header          number=9986897  hash="2c49ad…eb53b1" td=15243121260163223428153 age=11mo3w3d
INFO [04-21|23:12:38.463] Loaded most recent local full block      number=0        hash="d4e567…cb8fa3" td=<nil>                   age=52y2w4d
INFO [04-21|23:12:38.463] Loaded most recent local fast block      number=9981074  hash="d30181…083d11" td=15229501211164615569028 age=11mo3w4d
INFO [04-21|23:12:38.464] Loaded last fast-sync pivot marker       number=12284474
WARN [04-21|23:12:38.464] Rolled back chain segment                header=10000566->9986897  fast=9981074->9981074 block=0->0 reason="syncing cance
led (requested)"
WARN [04-21|23:12:38.464] Synchronisation failed, dropping peer    peer=ab2b5f969b34c41c8ca72ab2ae783086d6c7c7432b9b2a78238747eb007fd878 err=timeou
t
ERROR[04-21|23:12:38.464] Ethereum peer removal failed             peer=ab2b5f96 err="peer not registered"
INFO [04-21|23:12:40.638] Initializing state bloom                 items=3943985 errorrate=0.000 elapsed=6m23.386s
INFO [04-21|23:12:48.705] Initializing state bloom                 items=3944006 errorrate=0.000 elapsed=6m31.453s
INFO [04-21|23:12:56.713] Initializing state bloom                 items=3944038 errorrate=0.000 elapsed=6m39.461s
INFO [04-21|23:13:02.299] Imported new state entries               count=40  elapsed="11.221µs"  processed=16086732 pending=596  trieretry=0 codere
try=0 duplicate=0 unexpected=0
ERROR[04-21|23:13:02.299] Invalid block total difficulty RLP       hash="d4e567…cb8fa3" err="rlp: value size exceeds available input length"
ERROR[04-21|23:13:02.299] Ethereum peer removal failed             peer=057eb01b err="peer not registered"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x5455c4]

goroutine 9633 [running]:
math/big.(*Int).Cmp(0xc027cdd3a0, 0x0, 0x67f5d5406ab810c0)
        math/big/int.go:328 +0x44
github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).processHeaders(0xc0014a4000, 0x984c93, 0xc027cdd3a0, 0x0, 0x0)
        github.com/ethereum/go-ethereum/eth/downloader/downloader.go:1586 +0x126f
github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).syncWithPeer.func6(0x41c4b5, 0xc024deac80)
        github.com/ethereum/go-ethereum/eth/downloader/downloader.go:563 +0x3f
github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).spawnSync.func1(0xc0014a4000, 0xc026c32d80, 0xc038f06ea0)
        github.com/ethereum/go-ethereum/eth/downloader/downloader.go:584 +0x5b
created by github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).spawnSync
        github.com/ethereum/go-ethereum/eth/downloader/downloader.go:584 +0xaf

@pepa65
Copy link
Author

pepa65 commented Apr 23, 2021

FYI, without --cache=512 --lightkdf it has been running for over 12 hours already.

@holiman
Copy link
Contributor

holiman commented Apr 23, 2021

Interesting:

ERROR[04-21|23:06:16.642] Invalid block total difficulty RLP       hash="d4e567…cb8fa3" err="rlp: value size exceeds available input length"
...
ERROR[04-21|23:10:51.519] Invalid block total difficulty RLP       hash="d4e567…cb8fa3" err="rlp: value size exceeds available input length"
...

That is the genesis. So previously, it was nil, but now it suddenly has some rlp encoding error. There is actually a possibility we can debug this further. I'll be back

@holiman
Copy link
Contributor

holiman commented Apr 23, 2021

If you execute geth db get 0x680000000000000000d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa374 , that should spit out the rlp-data stored in the database for the genesis block.

@holiman
Copy link
Contributor

holiman commented Apr 23, 2021

For me, it spits out

key 0x680000000000000000d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa374: 0x850400000000

Which is

$ rlpdump -hex 0x850400000000
0400000000

And 0x400000000 is hex(17179869184). I'm curious to see what data has been placed there, where there should be an rlp-encoded number.

@pepa65
Copy link
Author

pepa65 commented Apr 23, 2021

geth db get 0x680000000000000000d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa374 -- do I need to run that while the normal geth node is running, or do I need to stop that first??

@holiman
Copy link
Contributor

holiman commented Apr 23, 2021

You will need to shut it down first

@pepa65
Copy link
Author

pepa65 commented Apr 23, 2021

It had just crashed anyway. geth --datadir=/data/Geth db get 0x680000000000000000d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa374:

INFO [04-23|20:59:52.622] Maximum peer count                       ETH=50 LES=0 total=50
INFO [04-23|20:59:52.623] Smartcard socket not found, disabling    err="stat /run/pcscd/pcscd.comm: no such file or directory"
INFO [04-23|20:59:53.200] Set global gas cap                       cap=25000000
INFO [04-23|20:59:53.200] Allocated cache and file handles         database=/data/Geth/geth/chaindata cache=512.00MiB handles=524288 readonly=true
INFO [04-23|21:00:16.262] Opened ancient database                  database=/data/Geth/geth/chaindata/ancient readonly=true
key 0x680000000000000000d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa374: 0x850400000000

@holiman
Copy link
Contributor

holiman commented Apr 23, 2021

The mystery deepens. So, the correct TD is indeed in leveldb. Could you please go to /data/Geth/geth/chaindata/ancient and do xxd diffs.0000.rdat | head -- this will show the first part of the difficulties, which should include the genesis td.

@holiman
Copy link
Contributor

holiman commented Apr 23, 2021

(this time it doesn't matter if the node is running or not)

@pepa65
Copy link
Author

pepa65 commented Apr 23, 2021

xxd /data/Geth/geth/chaindata/ancient/diffs.0000.rdat |head:

00000000: 8912 68e1 80ba 1351 00af 8912 68e3 f1ed  ..h....Q....h...
00000010: e6a5 5bac 8912 68e6 637f e074 2134 8912  ..[...h.c..t!4..
00000020: 68e8 d4d3 a803 ace4 8912 68eb 4685 9a0c  h.........h.F...
00000030: 2a85 8912 68ed b7ab 1f98 2608 8912 68f0  *...h.....&...h.
00000040: 28e0 a524 218b 8912 68f2 9a74 5160 ce8d  (..$!...h..tQ`..
00000050: 8912 68f5 0c17 fd9d 7b8f 8912 68f7 7b5a  ..h.....{...h.{Z
00000060: 062d ebe9 8912 68f9 eaf9 f6ff 6e51 8912  .-....h.....nQ..
00000070: 68fc 5af7 dbcf 0ae9 8912 68fe cb53 c05b  h.Z.......h..S.[
00000080: 4174 8912 6901 3ad5 8271 c36d 8912 6903  At..i.:..q.m..i.
00000090: aab5 34c0 8836 8912 6906 19ba f32c 6f77  ..4..6..i....,ow

@holiman
Copy link
Contributor

holiman commented Apr 23, 2021

Interesting. Here's how it should look:

00000000: 8504 0000 0000 8507 ff80 0000 850b fe80  ................
00000010: 1000 850f fd00 3ffe 8513 fb00 9ff7 8517  ......?.........
00000020: f980 bffb 851b f781 0ffb 851f f601 2005  .............. .
00000030: 8523 f501 0011 8527 f480 c019 852b f380  .#.....'.....+..
00000040: 9029 852f f300 4033 8533 f2ff e033 8537  .)./..@3.3...3.7
00000050: f27f 803f 853b f27f 103f 853f f2fe a031  ...?.;...?.?...1
00000060: 8543 f3fe 4014 8547 f57d ffea 854b f77d  .C..@..G.}...K.}
00000070: efb7 854f f9fe 1f81 8553 fcfe 9f50 8558  ...O.....S...P.X
00000080: 007f 7f2e 855c 0480 cf27 8560 0902 9f49  .....\...'.`...I
00000090: 8564 0e04 ffa5 8568 1388 004d 856c 198b  .d.....h...M.l..

If I display it with different cols, the diffs are as follows:

$ xxd -c 6 diffs.0000.rdat| head
00000000: 8504 0000 0000  ......
00000006: 8507 ff80 0000  ......
0000000c: 850b fe80 1000  ......
00000012: 850f fd00 3ffe  ....?.
00000018: 8513 fb00 9ff7  ......
0000001e: 8517 f980 bffb  ......
00000024: 851b f781 0ffb  ......
0000002a: 851f f601 2005  .... .
00000030: 8523 f501 0011  .#....
00000036: 8527 f480 c019  .'....

Your difficulty entries, rlp wise, are

index hexdata
0 8912 68e1 80ba
1 1351 00af 8912
2 68e3 f1ed e6a5
3 5bac 8912 68e6
4 637f e074 2134

None of that looks even remotely like it should (rlp-encoded big.Int which are only 6 bytes large in this region). For example, every single item in the whole list should begin with 85. cc @fjl any ideas what that data could be?

@holiman
Copy link
Contributor

holiman commented Apr 23, 2021

So, actually, your data is not aligned on 6-bytes per entry, if I rearrange the columnwidth a bit, it's easy to see:

8912 68e1 80ba 1351 00af 
8912 68e3 f1ed e6a5 5bac 
8912 68e6 637f e074 2134 
8912 68e8 d4d3 a803 ace4 
8912 68eb 4685 9a0c 2a85 
8912 68ed b7ab 1f98 2608 
8912 68f0 28e0 a524 218b 
8912 68f2 9a74 5160 ce8d
8912 68f5 0c17 fd9d 7b8f 
8912 68f7 7b5a 062d ebe9 
8912 68f9 eaf9 f6ff 6e51 
8912 68fc 5af7 dbcf 0ae9 
8912 68fe cb53 c05b 4174 
8912 6901 3ad5 8271 c36d 
8912 6903 aab5 34c0 8836 
8912 6906 19ba f32c 6f77

It's actually 10-byte per element. When arranged thus, it can also be rlp decoded:

rlpdump -hex 891268e180ba135100af
1268e180ba135100af

@holiman
Copy link
Contributor

holiman commented Apr 23, 2021

This is also how it should be -- but a lot later:

02345fe0: 68d7 bf97 5b9a fedd 8912 68da 2f53 0a98  h...[.....h./S..
02345ff0: 631c 8912 68dc 9f6c b10b a707 8912 68df  c...h..l......h.
02346000: 0fe4 5ab3 b95a 8912 68e1 80ba 1351 00af  ..Z..Z..h....Q..
                         ^^^^^^^^^^^^^^^^^^^^^^^^  < start of your data
02346010: 8912 68e3 f1ed e6a5 5bac 8912 68e6 637f  ..h.....[...h.c.
02346020: e074 2134 8912 68e8 d4d3 a803 ace4 8912  .t!4..h.........
02346030: 68eb 4685 9a0c 2a85 8912 68ed b7ab 1f98  h.F...*...h.....
02346040: 2608 8912 68f0 28e0 a524 218b 8912 68f2  &...h.(..$!...h.

So, as far as I can tell, you're missing 36986902 bytes in the beginning of the file. The index is out of sync with it -- hence why it tries to read 6 bytes, but the rlp-data says "I"m 10 bytes", so the error rlp: value size exceeds available input length is thrown.

This is really interesting case of data corruption. We will investigate what could possible be the cause for something like that to happen.

Some follow-up questions:

  • Is the datadir folder an actual 'regular' folder on the same filesystem as the operating system? (As opposed to, say, a CIFS or NTFS mount)
  • Have you experienced a full computer / OS crash during geth execution at any point? (as in, not just geth crashing, but like disconecting the power supply or hard computer shutdown)
  • Is there anything out of the ordinary with your system? (like, ARM cpu or nonstandard filesystem, or virtualiized linux inside a windows host or something)

@holiman
Copy link
Contributor

holiman commented Apr 23, 2021

I have a suspicion that what's causing this is a race between Append and Truncate, where a sync-rollback tries to truncate the items, simultaneously as something else is doing an Append. There's a gap where we release an RLock, and switch to a Lock. Before we get the lock, the Truncate can set the head back to 0. This causes us to open file 0 as the 'next' file, and append the item onto it.

At least, that's the only path I could find where we actually open files in truncation mode, so it's the most likely culprit IMO. A rudimentary fix would be something like below.

I'll try to trigger this behaviour via some testcase or custom binary (maybe a fuzzer), so we have something to test against.

diff --git a/core/rawdb/freezer_table.go b/core/rawdb/freezer_table.go
index b614c10d37..13b2c79903 100644
--- a/core/rawdb/freezer_table.go
+++ b/core/rawdb/freezer_table.go
@@ -465,34 +465,30 @@ func (t *freezerTable) releaseFilesAfter(num uint32, remove bool) {
 // Note, this method will *not* flush any data to disk so be sure to explicitly
 // fsync before irreversibly deleting data from the database.
 func (t *freezerTable) Append(item uint64, blob []byte) error {
+	// Encode the blob before the lock portion
+	if !t.noCompression {
+		blob = snappy.Encode(nil, blob)
+	}
 	// Read lock prevents competition with truncate
-	t.lock.RLock()
+	t.lock.Lock()
+	defer t.lock.Unlock()
 	// Ensure the table is still accessible
 	if t.index == nil || t.head == nil {
-		t.lock.RUnlock()
 		return errClosed
 	}
 	// Ensure only the next item can be written, nothing else
 	if atomic.LoadUint64(&t.items) != item {
-		t.lock.RUnlock()
 		return fmt.Errorf("appending unexpected item: want %d, have %d", t.items, item)
 	}
-	// Encode the blob and write it into the data file
-	if !t.noCompression {
-		blob = snappy.Encode(nil, blob)
-	}
 	bLen := uint32(len(blob))
 	if t.headBytes+bLen < bLen ||
 		t.headBytes+bLen > t.maxFileSize {
 		// we need a new file, writing would overflow
-		t.lock.RUnlock()
-		t.lock.Lock()
 		nextID := atomic.LoadUint32(&t.headId) + 1
 		// We open the next file in truncated mode -- if this file already
 		// exists, we need to start over from scratch on it
 		newHead, err := t.openFile(nextID, openFreezerFileTruncated)
 		if err != nil {
-			t.lock.Unlock()
 			return err
 		}
 		// Close old file, and reopen in RDONLY mode
@@ -503,11 +499,7 @@ func (t *freezerTable) Append(item uint64, blob []byte) error {
 		t.head = newHead
 		atomic.StoreUint32(&t.headBytes, 0)
 		atomic.StoreUint32(&t.headId, nextID)
-		t.lock.Unlock()
-		t.lock.RLock()
 	}
-
-	defer t.lock.RUnlock()
 	if _, err := t.head.Write(blob); err != nil {
 		return err
 	}

@holiman
Copy link
Contributor

holiman commented Apr 23, 2021

A simpler (and more performant) fix would be to just re-check

	if atomic.LoadUint64(&t.items) != item {
		return fmt.Errorf("appending unexpected item: want %d, have %d", t.items, item)
	}

after we obtain the writelock. But I think we need to have a stable repro in order to verify any fix.

@pepa65
Copy link
Author

pepa65 commented Apr 24, 2021

Hope you'll be able to improve the software. But what can it be that this only manifests on my machine/setup??

@holiman
Copy link
Contributor

holiman commented Apr 24, 2021 via email

@pepa65
Copy link
Author

pepa65 commented Apr 24, 2021

I'm happy to help, this is not in production ;-). I'll keep this running unless it needs to be interrupted for explorations. My ancient directory is 166GB right now, so impractical to transfer.

@holiman
Copy link
Contributor

holiman commented Apr 24, 2021

Thanks for your help. I was able to create a repro testcase of the behaviour, and I don't think we need your data any longer, since the root cause is identified.
You will need to do a fresh sync to fix the problems, unfortunately, since the corruption most certainly will not "heal".

@pepa65
Copy link
Author

pepa65 commented Apr 24, 2021

So bad data somehow got written? OK, I will wait for a new binary and start over.

@fjl
Copy link
Contributor

fjl commented Apr 27, 2021

@pepa65 the answer to your question is: yes, bad data was written because of a bug in the code. We will create a new release next couple days.

@miaoqinyang
Copy link

所以我该怎么做?丢掉所有130GB的空间,然后重新开始,还是我还能做些其他事情?

那么坏数据以某种方式被写入了吗?好的,我将等待新的二进制文件并重新开始。

I have the same problem with you. Has your problem been solved

@pepa65
Copy link
Author

pepa65 commented May 1, 2021

@miaoqinyang I started from scratch and have been running ever since like most people. If you build yourself from master, you will get the fix. But since you already have bad data, I think you need to start over, because you probably will hit this again and again with your corrupt data. If you use release, it is very rare to get this again (but theoretically possible).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants