Update pivot basing on message from CL #5256

marcindsobczak · 2023-02-08T13:45:03Z

Closes #4895

Changes

add option of updating pivot basing on message from CL
MaxAttemptsToUpdatePivot = 900 by default. It will try to update pivots 900 times (so for about 15min). It will be activated only on post-merge chains.

Types of changes

Bugfix (a non-breaking change that fixes an issue)
New feature (a non-breaking change that adds functionality)
Breaking change (a change that causes existing functionality not to work as expected)
Optimization
Refactoring
Documentation update
Build-related changes
Other: Description

Testing

Requires testing

Yes
No

If yes, did you write tests?

Yes
No

LukaszRozmej · 2023-02-10T18:16:03Z

Why disabled by default?
Can we make it smarter, so it won't depend on config value?

marcindsobczak · 2023-02-13T15:37:24Z

It is "smart" I think. We can hardcode "MaxAttemptsToUpdatePivot": 180 or something like that, but making it disabled in the code and enable in default configs of chains like mainnet, gnosis or goerli makes sense to me, as it is not really needed on small chains which can sync fast without adding more complexity and waiting for CL message. We can make it true or false as well, but I like option of specifying number of attempts as it is giving more flexibility for more advanced users and less advanced ones would use default config anyway

LukaszRozmej · 2023-02-15T13:06:03Z

So this basically works by delaying starting normal sync by 180 seconds at which time the CL can send us the pivot. If someone re syncs CL it will probably miss this time window. (am I right?)

What I would call smart is starting the normal sync from a hardcoded pivot and when CL gives us another pivot if it is far enough into the future that it makes sense - restarting the whole process. Make these conditions based on some good heuristic.

This way we would support situations when we get pivot like 1h into sync or something.

kamilchodola · 2023-02-15T13:27:32Z

@marcindsobczak
If we will keep that disabled AND put that on configs, then any existing user which will decide to make resync (because of free space/corruption etc) will have it disabled - are we sure this is good approach?

kamilchodola · 2023-02-15T13:29:42Z

@marcindsobczak
Assuming that we will start node and stop it before it will proceed to next stage (before it will catchup the head) and keep it stopped for few hours - how it will behave? Will it have info about old block or will refresh that info?

marcindsobczak · 2023-02-15T14:39:14Z

@LukaszRozmej generally you are right, but

CLs have checkpoint sync and it takes 1-2 minutes to be synced and send us pivot close to the head
If CL is not synced there is not much what we can do. Our hardcoded pivots are not older than 4 weeks, downloading of newest blocks takes not more than 1h. If CL was not able to send us pivot in first ~3min, we just sync old way
It might have sense to start downloading blocks after hardcoded pivot and then move pivot after receiving message from CL. There might be a problem with downloading headers/bodies from new pivot downwards, as lowest inserted would be already set close to hardcoded pivot and maybe just overwriting lowest inserted after pivot update would be enough. But how we will behave when we will be downloading it backwards and face some headers/bodies already downloaded? Need to explore it

In practice, after running Nethermind + CL with checkpoint sync, it takes not more than 1-2min to update pivot and then <1min to start SnapSync. I expect it to behave like that in majority of use-cases and then state sync would start way faster (with 3-4 weeks old pivots even 30-60min). In minority of cases when update would not be successful, state sync will just start 3min later than without this feature. What I can potentially improve is to not postpone downloading of blocks after hardcoded pivot by the time of waiting for CL message and then sync time would not be affected in any scenario. I need to check how much additional complexity this will bring

@kamilchodola

If we will keep that disabled AND put that on configs, then any existing user which will decide to make resync (because of free space/corruption etc) will have it disabled - are we sure this is good approach?

I'm not sure. Open to discuss. Advantage of having it disabled by default (even in default config files) is that it is a new feature and we might release it as beta version which must be manually triggered when running nethermind. We did it when introducing SnapSync. After few weeks and many users experimenting with that it was enabled by default in the config. But SnapSync was way more complex than this small feature so maybe we don't need that approach? Again, open to discuss

Assuming that we will start node and stop it before it will proceed to next stage (before it will catchup the head) and keep it stopped for few hours - how it will behave? Will it have info about old block or will refresh that info?

Catching the head after updating pivot take just <1min. But right, it is still possible to turn the node off during this period. Once updated pivot would not be updated again - it will keep using data from first update. So in this scenario we would just continue downloading newest blocks from the time when node was disabled, so similar as in current implementation.
So let's say new pivot is Y. Head is moving number Z
After start we are waiting for potential update. After update we start downloading blocks from Y+1 upwards. Lets say we downloaded 20 blocks so our newest block is Y+21 and state sync didn't start, because Z is at Y+200. We turned node off. Turned it on when Z is at Y+1000. We will simply continue downloading from Y+21 until we catch the Z.

kamilchodola · 2023-02-24T09:13:48Z

@marcindsobczak
Question regarding scenario with CL turned off on Nethermind startup.
As I understand, now we will wait for 180 seconds to get FCU from CL and if we will nto receive that then we will start with Pivot sync.

But on PoS it does not have sense especially when Pivot is already post-merge - it will not start syncing at all.

Can we make that on PoS that it will always wait for Fcu no matter for how long? If we are not getting that, then connection can be treated as invalid.

Ofc - correct me if I'm wrong :)

marcindsobczak · 2023-03-28T14:24:20Z

@kamilchodola you are right
We discussed it with @LukaszRozmej @MarekM25 and @benaadams and infinite timeout might be dangerous, so we decided to set it to ~15min to let users more time to setup CL and on the other hand not block downloading of old headers for longer - if it will take more than 15min, probably CL is not setup correctly, so most likely we can download old headers (and post-pivot blocks) in the meantime, until it will be fixed.
Starting download of old headers just after client start and updating pivot later is hard to implement and is not improving much, so it won't be done in this PR, maybe in the future

src/Nethermind/Nethermind.Merge.Plugin/MergePlugin.cs

LukaszRozmej · 2023-03-30T08:05:31Z

src/Nethermind/Nethermind.Db/MetadataDbKeys.cs

+        public const int UpdatedPivotHash = 9;
+        public const int UpdatedPivotNumber = 10;


I dislike that there are 2 keys, we should pack this into 1 value, having first 8bytes being number and later 32 bytes being a hash.

Or just using RLP.

There will be really small advantage in number of writes/reads - we are writing it only once per node life and reading it only when there was a restart during sync. So performance is definitely not an issue here. And after this change there will be a lack of consistency - we will have:

public const int TerminalPoWHash = 1; public const int TerminalPoWNumber = 2; (...) public const int BeaconSyncPivotHash = 5; public const int BeaconSyncPivotNumber = 6; (...) public const int UpdatedPivotData = 9;

so hashes, numbers and one combined value of number+hash.
In other values we are using separated numbers and hashes. I can change it if you want, but IMO it is better in current form.

src/Nethermind/Nethermind.Merge.Plugin/Synchronization/PivotUpdator.cs

LukaszRozmej · 2023-03-30T08:11:49Z

src/Nethermind/Nethermind.Synchronization/ParallelSync/MultiSyncModeSelector.cs

+            return updateRequestedAndNotFinished &&
+                   FastSyncEnabled &&
+                   isPostMerge &&
+                   stateSyncNotFinished;


I think if we even start state sync, there is no point of updating Pivot as we already FastSynced to Head?

please add logger.Trace like for other methods

We need it to have correct pivot when downloading headers/bodies/receipts to avoid gaps. After restart pivot is set back to value from config. We are going to PivotUpdator, it is checking db for updated values, founding it, setting pivot back to updated value and that's all. We just have logic of reading already updated pivot from db in class PivotUpdator.
Actually I think it can be problematic if restart would be during old bodies. Then we will not read it from db and sync of old receipts would start from old pivot. I think it is a bug and I need to change stateSyncNotFinished to oldReceiptsNotFinished, as we stop carrying about pivot only when we are fully synced (not only state). Or maybe even change it to always go to PivotUpdator and read updated pivot from db? Need to think about it a bit more

and logger added

I think if we even start state sync, there is no point of updating Pivot as we already FastSynced to Head?

Right. We are trying to read already updated pivot from DB during initialization (in InitializeNetwork step). If it is successful, updateRequestedAndNotFinished = _syncConfig.MaxAttemptsToUpdatePivot > 0 check will return false and mode would not change to UpdatingPivot. If not, we want to update pivot always if we are post-merge and fast sync in enabled. ~~No need to check state here.~~

Actually we need this state check - it is for already synced nodes which will be updated to version with PivotUpdator. We don't want to update pivot in that case.
Looks like it was implemented right way from the very beginning, I just forgot why was that during long period from writing it 😄

So to sum up - right about

I think if we even start state sync, there is no point of updating Pivot as we already FastSynced to Head?

but this check is needed for backward compatibility with nodes synced before updating to Nethermind version with PivotUpdator.

In general, pivot can be updated only once in node lifetime. Then is saved to DB and loaded during initialization.

LukaszRozmej · 2023-03-30T08:12:37Z

src/Nethermind/Nethermind.Merge.Plugin/Synchronization/PivotUpdator.cs

+
+                _syncConfig.PivotNumber = updatedPivotBlockNumber.ToString();
+                _syncConfig.PivotHash = updatedPivotBlockHash.ToString();
+                _syncConfig.MaxAttemptsToUpdatePivot = 0;


Can we not modify the config value?

LukaszRozmej · 2023-03-30T08:12:54Z

src/Nethermind/Nethermind.Synchronization/ParallelSync/MultiSyncModeSelector.cs

+
+        private bool ShouldBeInUpdatingPivot()
+        {
+            bool updateRequestedAndNotFinished = _syncConfig.MaxAttemptsToUpdatePivot > 0;


Can we not modify the config value?

please add logger.Trace like for other methods

src/Nethermind/Nethermind.Synchronization/ParallelSync/MultiSyncModeSelector.cs

LukaszRozmej · 2023-03-30T08:15:49Z

src/Nethermind/Nethermind.Merge.Plugin/Synchronization/BeaconSync.cs

+
+        public Keccak GetFinalizedHash()
+        {
+            return _blockCacheService.FinalizedHash;


This cannot go from BlockTree right? As before we sync it won't give us correct answer?

FinalizedHash is needed in MultiSyncModeSelector. We could potentially pass it through SyncProgressResolver, but BeaconSync fits perfect for both, MultiSyncModeSelector and PivotUpdator (BeaconSync is needed in PivotUpdator anyway because of GetTargetBlockHeight()).

asdacap · 2023-03-30T10:47:06Z

Have you double check other place that uses the config? StartupBlockTreeFixer, BlockTree, ReceiptFixMigration, etc and FastHeaderSyncFeed does not have same modification as BodiesSyncFeed?

# Conflicts: # src/Nethermind/Nethermind.Blockchain/Synchronization/SyncConfig.cs # src/Nethermind/Nethermind.Synchronization/ParallelSync/MultiSyncModeSelector.cs

…receipts)" This reverts commit eccecab.

marcindsobczak · 2023-05-08T17:05:53Z

@asdacap FastHeadersSyncFeed already has _pivotNumber = _syncConfig.PivotNumberParsed; in InitializeFeed() -> ResetPivot()

If pivot was already updated, it is read from DB in InitializeNetwork step.

StartupBlockTreeFixer is a part of ReviewBlockTree, which have InitializeNetwork in dependencies.
ReceiptFixMigration is a part of DatabaseMigrations which also have InitializeNetwork in dependencies.
In BlockTree pivots are read in time of calling methods, so we are fine as well.
I don't see a part of the code where we are not fine.

marcindsobczak · 2023-05-08T18:01:08Z

@luk About modifying config values, I agree it is not the best practice.

But looking at the usages, if we don't want to overwrite SyncConfig, we need to create new class and pass it to:

BeaconPivot
BlockTree
BodiesSyncFeed
HeadersSyncFeed
MultiSyncModeSelector
NewPayloadHandler
Pivot
ReceiptFixMigration
ReceiptsSyncFeed
StartupBlockTreeFixer
SyncReport
SyncServer

I'm not sure what is worse here - overwriting config data or creating a lot of code and additional complexity which can be easily avoided.

marcindsobczak added 16 commits February 8, 2023 14:34

Update pivot basing on message from CL

804b351

cosmetic

17a614f

cosmetic

54c7152

Merge remote-tracking branch 'origin/master' into feature/pivots_from_cl

fdd47df

add max attempts, refactor

0bce28e

decrease log level when failed to update

962541e

add check if new pivot is higher than old one

9319692

adjust bodies and receipts sync feeds

28645ff

update pivot only when state sync not finished

76ffbf3

fix logger levels

e7ffc03

add validation of hash of pivot block header

65b439a

save new pivot for bodies and receipts feeds

08f4992

refactor, add storing in db

d6d1fe3

enable update by default (for testing reasons)

3888371

one more place

2b8d7e5

cosmetics

c116a9b

marcindsobczak requested review from LukaszRozmej and MarekM25 February 10, 2023 18:12

make disabled by default

ef0ae36

Merge remote-tracking branch 'origin/master' into feature/pivots_from_cl

8105ffd

marcindsobczak added 2 commits March 23, 2023 17:02

Merge remote-tracking branch 'origin/master' into feature/pivots_from_cl

9b7bbb4

enable by default with timeout 900s (15min)

e0add0f

marcindsobczak marked this pull request as ready for review March 28, 2023 14:25

marcindsobczak requested review from benaadams and kamilchodola March 28, 2023 14:26

LukaszRozmej requested changes Mar 30, 2023

View reviewed changes

LukaszRozmej requested a review from asdacap March 30, 2023 08:16

marcindsobczak added 2 commits March 30, 2023 12:23

fix Interlock to be thread safe

af8b406

use MetadataDb from interface

2374ed3

marcindsobczak added 3 commits March 30, 2023 12:56

add logger.Trace to ShouldBeInUpdatingPivot()

8a479e7

add flag notInUpdatingPivot to all traces

3ee777c

Merge remote-tracking branch 'origin/master' into feature/pivots_from_cl

11d20b0

# Conflicts: # src/Nethermind/Nethermind.Blockchain/Synchronization/SyncConfig.cs # src/Nethermind/Nethermind.Synchronization/ParallelSync/MultiSyncModeSelector.cs

marcindsobczak requested a review from a team as a code owner May 5, 2023 16:39

marcindsobczak added 2 commits May 8, 2023 14:33

read pivots from DB until whole sync not finished (including receipts)

eccecab

Revert "read pivots from DB until whole sync not finished (including …

454f799

…receipts)" This reverts commit eccecab.

marcindsobczak added 5 commits May 9, 2023 13:37

compress pivot data to be under one key in db

1cfd34d

change some Warn logs to Info in PivotUpdator

62d7174

fix beacon headers in MultiSyncModeSelector

19c08e7

add trace to ShouldBeInBeaconHeaders

3e2ef85

fix whitespaces

5cf80d8

marcindsobczak requested a review from LukaszRozmej May 9, 2023 13:05

marcindsobczak added 2 commits May 10, 2023 10:39

Merge remote-tracking branch 'origin/master' into feature/pivots_from_cl

cc5786e

print waiting log every 10s instead of 1s

1fe1e6b

kamilchodola approved these changes May 11, 2023

View reviewed changes

LukaszRozmej approved these changes May 11, 2023

View reviewed changes

marcindsobczak merged commit 2368df4 into master May 11, 2023

marcindsobczak deleted the feature/pivots_from_cl branch May 11, 2023 13:33

marcindsobczak mentioned this pull request Aug 9, 2023

Refactor PivotUpdator to be really infinite #5992

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update pivot basing on message from CL #5256

Update pivot basing on message from CL #5256

marcindsobczak commented Feb 8, 2023 •

edited

Loading

LukaszRozmej commented Feb 10, 2023

marcindsobczak commented Feb 13, 2023

LukaszRozmej commented Feb 15, 2023

kamilchodola commented Feb 15, 2023

kamilchodola commented Feb 15, 2023

marcindsobczak commented Feb 15, 2023

kamilchodola commented Feb 24, 2023

marcindsobczak commented Mar 28, 2023

LukaszRozmej Mar 30, 2023

marcindsobczak Mar 30, 2023

LukaszRozmej Mar 30, 2023

LukaszRozmej Mar 30, 2023

marcindsobczak Mar 30, 2023 •

edited

Loading

marcindsobczak Mar 30, 2023

marcindsobczak May 8, 2023 •

edited

Loading

marcindsobczak May 8, 2023

marcindsobczak May 8, 2023

LukaszRozmej Mar 30, 2023

LukaszRozmej Mar 30, 2023

LukaszRozmej Mar 30, 2023

LukaszRozmej Mar 30, 2023

marcindsobczak May 8, 2023

asdacap commented Mar 30, 2023

marcindsobczak commented May 8, 2023

marcindsobczak commented May 8, 2023

		public const int UpdatedPivotHash = 9;
		public const int UpdatedPivotNumber = 10;

Update pivot basing on message from CL #5256

Update pivot basing on message from CL #5256

Conversation

marcindsobczak commented Feb 8, 2023 • edited Loading

Changes

Types of changes

Testing

Requires testing

If yes, did you write tests?

LukaszRozmej commented Feb 10, 2023

marcindsobczak commented Feb 13, 2023

LukaszRozmej commented Feb 15, 2023

kamilchodola commented Feb 15, 2023

kamilchodola commented Feb 15, 2023

marcindsobczak commented Feb 15, 2023

kamilchodola commented Feb 24, 2023

marcindsobczak commented Mar 28, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcindsobczak Mar 30, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcindsobczak May 8, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asdacap commented Mar 30, 2023

marcindsobczak commented May 8, 2023

marcindsobczak commented May 8, 2023

marcindsobczak commented Feb 8, 2023 •

edited

Loading

marcindsobczak Mar 30, 2023 •

edited

Loading

marcindsobczak May 8, 2023 •

edited

Loading