Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eth/protocols/snap: reschedule missed deliveries #42

Closed
wants to merge 2 commits into from

Conversation

holiman
Copy link

@holiman holiman commented Dec 4, 2020

This (hopefully) fixes a flaw, where a remote peer rejects the request. Previously, we would seemingly not reschedule the request, until the next sync cycle. For some reason, which I'm not fully clear about, it could get stuck for hours (my NUC was stuck on the same things for >12 hours), requesting (and getting rejected) the same segments from the same peer(s).

This PR reverts the request if the remote side cannot deliver, so it can be rescheduled for another peer.

Example output when running it

DEBUG[12-04|11:28:11.318] Persisted set of healing data            bytes=40.43KiB [134/1839]
DEBUG[12-04|11:28:11.318] Peer rejected trienode heal request      peer=2d1f1ff2 reqid=2436914576115805419
DEBUG[12-04|11:28:11.318] Reverting trienode heal request          peer=2d1f1ff2774352477bfbf6ee10868373b074e7cd678270608eccfb2205d0e874
DEBUG[12-04|11:28:11.500] Persisted set of healing data            bytes=41.12KiB
INFO [12-04|11:28:12.503] Imported new block headers               count=1   elapsed=5.563ms    number=11385599 hash="d69950…88b422"
WARN [12-04|11:28:12.934] Pivot became stale, moving               old=11385473 new=11385537
INFO [12-04|11:28:13.011] Imported new block receipts              count=64  elapsed=76.480ms   number=11385536 hash="d72ac0…fe5a1e" age=12m42s    size=5.50MiB
DEBUG[12-04|11:28:13.011] Terminating snapshot sync cycle          root="a5adab…35513a"  
INFO [12-04|11:28:13.011] State heal in progress                   nodes=51339@13.60MiB codes=11@63.84KiB    pending=52937
DEBUG[12-04|11:28:13.012] Starting snapshot sync cycle             root="5d4bc8…1e0968"  
DEBUG[12-04|11:28:13.198] Persisted set of healing data            bytes=0.00B
DEBUG[12-04|11:28:13.600] Peer rejected trienode heal request      peer=8d1da4c5 reqid=4602343511521169247
DEBUG[12-04|11:28:13.600] Reverting trienode heal request          peer=8d1da4c5bc682f8f8ad141f60d2889fa4232144c7d2b29b88f81ca33e586ef9d
DEBUG[12-04|11:28:13.817] Peer rejected trienode heal request      peer=0c625358 reqid=2618974347818703192
DEBUG[12-04|11:28:13.817] Reverting trienode heal request          peer=0c6253589a8d244d57abc1c8ab298e048f24487d6572d7890e0cbb7b70039621
DEBUG[12-04|11:28:14.025] Persisted set of healing data            bytes=0.00B
DEBUG[12-04|11:28:14.935] Persisted set of healing data            bytes=0.00B
DEBUG[12-04|11:28:15.581] Persisted set of healing data            bytes=0.00B
DEBUG[12-04|11:28:16.143] Persisted set of healing data            bytes=0.00B
INFO [12-04|11:28:16.143] State heal in progress                   nodes=52636@14.26MiB codes=11@63.84KiB    pending=20636
INFO [12-04|11:28:16.296] Initializing fast sync bloom             items=581032069 errorrate=0.001 elapsed=11m57.850s

My machine is now in the final stages of healing, so looking good so far

@holiman
Copy link
Author

holiman commented Dec 4, 2020

My node finally seems to have gotten stranded anyway,

DEBUG[12-04|12:58:33.337] Terminating snapshot sync cycle          root="7be343…381f8a"
INFO [12-04|12:58:33.337] State heal in progress                   nodes=2113295@465.40MiB codes=279@1.66MiB    pending=273
DEBUG[12-04|12:58:33.337] Starting snapshot sync cycle             root="591935…366737"
DEBUG[12-04|12:58:33.365] Persisted set of healing data            type=trienodes bytes=0.00B
DEBUG[12-04|12:58:33.538] Peer rejected trienode heal request      peer=0c625358 reqid=7863297954386170715
DEBUG[12-04|12:58:33.538] Reverting trienode heal request          peer=0c6253589a8d244d57abc1c8ab298e048f24487d6572d7890e0cbb7b70039621
INFO [12-04|12:59:18.301] Imported new block headers               count=1    elapsed=12.498ms   number=11385999 hash="219a90…83fe83" age=1m16s
INFO [12-04|12:59:30.424] Imported new block headers               count=1    elapsed=9.491ms    number=11386000 hash="604dc9…0e1f61" age=1m4s
INFO [12-04|12:59:31.355] Downloader queue stats                   receiptTasks=0   blockTasks=0   itemSize=209.19KiB throttle=314
INFO [12-04|13:00:15.868] Imported new block headers               count=1    elapsed=4.053ms    number=11386001 hash="f4f2a1…8afbaa" age=1m43s
INFO [12-04|13:00:46.181] Imported new block headers               count=1    elapsed=7.710ms    number=11386002 hash="6933f0…07c7fe" age=1m31s

@holiman
Copy link
Author

holiman commented Dec 4, 2020

INFO [12-04|14:32:26.073] Imported new block headers               count=1    elapsed=12.121ms   number=11386448 hash="bef081…f712ff"
INFO [12-04|14:32:26.233] Downloader queue stats                   receiptTasks=0   blockTasks=0   itemSize=209.43KiB throttle=313
WARN [12-04|14:32:26.234] Pivot became stale, moving               old=11386322 new=11386386
INFO [12-04|14:32:26.311] Imported new block receipts              count=64   elapsed=75.704ms   number=11386385 hash="1c5ea0…2876a3" age=16m       size=5.44MiB
DEBUG[12-04|14:32:26.311] Terminating snapshot sync cycle          root="0b7941…2d07fa"
INFO [12-04|14:32:26.311] State heal in progress                   nodes=2113331@465.42MiB codes=279@1.66MiB    pending=17
DEBUG[12-04|14:32:26.311] Starting snapshot sync cycle             root="e12856…890602"
DEBUG[12-04|14:32:26.541] Peer rejected trienode heal request      peer=8d1da4c5 reqid=9061876384210047615
DEBUG[12-04|14:32:26.541] Reverting trienode heal request          peer=8d1da4c5bc682f8f8ad141f60d2889fa4232144c7d2b29b88f81ca33e586ef9d
INFO [12-04|14:32:38.196] Imported new block headers               count=1    elapsed=7.899ms    number=11386449 hash="0c8c21…a39b16"
INFO [12-04|14:32:50.331] Imported new block headers               count=1    elapsed=7.602ms    number=11386450 hash="faab70…a5bb57"
INFO [12-04|14:32:56.418] Imported new block headers               count=1    elapsed=7.712ms    number=11386451 hash="6ec50f…3426e0"

@holiman
Copy link
Author

holiman commented Dec 4, 2020

Hm, might be an issue with the server - here's the 8d1 bootnode, in Korea:

INFO [12-04|17:07:44.880] Resuming state snapshot generation       root="7c5512…7e6fac" in="3e5afb…a57fb9" at="7e4bb1…24f66c" accounts=26428444 slots=99224168 storage=8.36GiB elapsed=46h44m39.670s eta=3999h48m39.583s
INFO [12-04|17:07:44.900] Imported new chain segment               blocks=1  txs=146  mgas=12.478  elapsed=3.409s        mgasps=3.660   number=11387417 hash="8b2a30…860163" dirty=361.96MiB

So it's still generating the snapshot, and it's gong really slow. Also

/dev/mapper/geth--vg-geth--lv  492G  482G     0 100% /geth

It's going to go OOD shortly. We should probably wipe+resync it. However, it should still be able to serve our healing-requests, as long as it's not more than 64 blocks behind the canon head, right?

@holiman
Copy link
Author

holiman commented Dec 4, 2020

It eventually completed sync!

@holiman
Copy link
Author

holiman commented Dec 9, 2020

rebased

@holiman
Copy link
Author

holiman commented Dec 9, 2020

This time when I ran it, it finished gracefully:

DEBUG[12-09|12:00:17.592] Terminating snapshot sync cycle          root="b190dd…508aba"  
INFO [12-09|12:00:17.592] State heal in progress                   nodes=1818948@450.08MiB codes=163@1.11MiB       pending=0   
INFO [12-09|12:00:17.616] Imported new block receipts              count=1    elapsed=4.675ms      number=11418122 hash="988779…532037" age=32m32s    size=101.72KiB 
INFO [12-09|12:00:17.619] Rebuilding state snapshot
INFO [12-09|12:00:17.622] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed="138.04µs"         
INFO [12-09|12:00:17.622] Committed new head block                 number=11418122 hash="988779…532037"                        
INFO [12-09|12:00:17.720] Deallocated state bloom                  items=704455880 errorrate=0.002                             
INFO [12-09|12:00:19.394] Deleted state snapshot leftovers         kind=accounts wiped=8893 elapsed=1.772s                     
INFO [12-09|12:00:20.687] Deleted state snapshot leftovers         kind=storage  wiped=0    elapsed=1.293s                     
INFO [12-09|12:00:20.687] Compacting snapshot account area                               

But then it started wiping -- which seems a bit odd?

INFO [12-09|12:00:21.846] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=4.223s             
INFO [12-09|12:00:22.145] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=4.523s             
INFO [12-09|12:00:25.500] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=7.878s             
INFO [12-09|12:00:26.469] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=8.847s             
INFO [12-09|12:00:26.471] Imported new chain segment               blocks=11       txs=2268      mgas=137.183 elapsed=8.749s       mgasps=15.679 number=11418133 hash="ef2403…9d4e08" age=29m16s    dirty=18.72MiB                               
INFO [12-09|12:00:27.209] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=9.586s             
INFO [12-09|12:00:27.991] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=10.368s            
INFO [12-09|12:00:29.023] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=11.400s            
INFO [12-09|12:00:30.008] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=12.385s            
INFO [12-09|12:00:30.776] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=13.154s            
INFO [12-09|12:00:31.367] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=13.745s            
INFO [12-09|12:00:32.072] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=14.450s            
INFO [12-09|12:00:32.077] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=14.455s            
INFO [12-09|12:00:32.870] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=15.247s            
INFO [12-09|12:00:33.686] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=16.063s            
INFO [12-09|12:00:34.670] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=17.048s            
INFO [12-09|12:00:34.673] Imported new chain segment               blocks=11       txs=1990      mgas=124.444 elapsed=8.201s       mgasps=15.173 number=11418144 hash="12ccfe…529065" age=26m14s    dirty=36.04MiB                               
INFO [12-09|12:00:35.328] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=17.706s            
INFO [12-09|12:00:36.297] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=18.675s            
INFO [12-09|12:00:36.949] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=19.327s            
INFO [12-09|12:00:37.605] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=19.983s            
INFO [12-09|12:00:38.782] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=21.160s            
INFO [12-09|12:00:39.281] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=21.658s            
INFO [12-09|12:00:40.260] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=22.638s            
INFO [12-09|12:00:41.521] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=23.899s            
INFO [12-09|12:00:41.616] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=23.993s            
INFO [12-09|12:00:42.409] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=24.786s            
INFO [12-09|12:00:43.095] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=25.472s            
INFO [12-09|12:00:43.097] Imported new chain segment               blocks=11       txs=1967      mgas=136.823 elapsed=8.424s       mgasps=16.242 number=11418155 hash="6c9b88…28b2fd" age=24m6s     dirty=54.32MiB                               
INFO [12-09|12:00:43.545] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=25.923s            
INFO [12-09|12:00:43.995] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=26.373s            
INFO [12-09|12:00:44.619] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=26.996s            
INFO [12-09|12:00:45.596] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=27.974s            
INFO [12-09|12:00:46.475] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=28.852s            
INFO [12-09|12:00:47.180] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=29.558s            
INFO [12-09|12:00:48.369] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=30.747s            
INFO [12-09|12:00:49.079] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=31.457s            
INFO [12-09|12:00:49.716] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=32.094s            
INFO [12-09|12:00:50.448] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=32.825s            
INFO [12-09|12:00:51.447] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=33.824s            
INFO [12-09|12:00:51.452] Imported new chain segment               blocks=11       txs=2046      mgas=137.132 elapsed=8.355s       mgasps=16.413 number=11418166 hash="dee4be…b0b107" age=20m37s    dirty=73.09MiB                           

...

INFO [12-09|12:01:32.558] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=1m14.936s
INFO [12-09|12:01:33.240] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=1m15.618s
INFO [12-09|12:01:33.243] Imported new chain segment               blocks=12       txs=2013      mgas=148.518 elapsed=8.337s       mgasps=17.813 number=11418221 hash="eca14c…458c7e" age=8m48s     dirty=165.12MiB
INFO [12-09|12:01:33.922] Wiper running, state snapshotting paused accounts=0                  slots=0                  storage=0.00B     elapsed=1m16.300s

So after wiping for a minute, the snapshot generation started:

INFO [12-09|12:01:34.592] Compacted snapshot area in database      elapsed=1m13.904s
INFO [12-09|12:01:34.592] Resuming state snapshot generation       root="25cf6b…687aff" accounts=0                  slots=0                  storage=0.00B     elapsed="8.781µs"
INFO [12-09|12:01:35.065] Aborting state snapshot generation       root="25cf6b…687aff" at="000049…093c79" accounts=505                slots=116                storage=34.41KiB  elapsed=472.993ms    eta=30h4m7.154s
INFO [12-09|12:01:35.065] Resuming state snapshot generation       root="f585dc…c75d28" at="000049…093c79" accounts=505                slots=116                storage=34.41KiB  elapsed=473.234ms    eta=30h7m56.007s
INFO [12-09|12:01:35.927] Aborting state snapshot generation       root="f585dc…c75d28" at="0000c5…32ed41" accounts=1324               slots=463                storage=103.51KiB elapsed=1.335s       eta=31h35m54.65s
INFO [12-09|12:01:35.927] Resuming state snapshot generation       root="cb6286…06a59e" at="0000c5…32ed41" accounts=1324               slots=463                storage=103.51KiB elapsed=1.335s       eta=31h35m54.65s
INFO [12-09|12:01:36.771] Aborting state snapshot generation       root="cb6286…06a59e" in="00010a…bff440" at="cc6aa2…49feec" accounts=1839               slots=959                storage=165.86KiB elapsed=2.178s       eta=38h2m16.648s
INFO [12-09|12:01:36.772] Resuming state snapshot generation       root="b33640…180f3d" in="00010a…bff440" at="cc6aa2…49feec" accounts=1839               slots=959                storage=165.86KiB elapsed=2.180s       eta=38h4m22.336s
INFO [12-09|12:01:37.326] Aborting state snapshot generation       root="b33640…180f3d" in="000145…5840fd" at="036b63…bb3db0" accounts=2197               slots=1092               storage=193.52KiB elapsed=2.734s       eta=39h7m4.872s
INFO [12-09|12:01:37.327] Resuming state snapshot generation       root="6ec50e…5666b1" in="000145…5840fd" at="036b63…bb3db0" accounts=2197               slots=1092               storage=193.52KiB elapsed=2.734s       eta=39h7m4.872s
INFO [12-09|12:01:37.852] Aborting state snapshot generation       root="6ec50e…5666b1" at="0001ac…a12dc2" accounts=2881               slots=1257               storage=240.92KiB elapsed=3.259s       eta=35h27m55.385s

@holiman
Copy link
Author

holiman commented Dec 14, 2020

Opened against master instead

@holiman holiman closed this Dec 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant