Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update pivot basing on message from CL #5256

Merged
merged 34 commits into from
May 11, 2023
Merged

Conversation

marcindsobczak
Copy link
Contributor

@marcindsobczak marcindsobczak commented Feb 8, 2023

Closes #4895

Changes

  • add option of updating pivot basing on message from CL
    MaxAttemptsToUpdatePivot = 900 by default. It will try to update pivots 900 times (so for about 15min). It will be activated only on post-merge chains.

Types of changes

  • Bugfix (a non-breaking change that fixes an issue)
  • New feature (a non-breaking change that adds functionality)
  • Breaking change (a change that causes existing functionality not to work as expected)
  • Optimization
  • Refactoring
  • Documentation update
  • Build-related changes
  • Other: Description

Testing

Requires testing

  • Yes
  • No

If yes, did you write tests?

  • Yes
  • No

@LukaszRozmej
Copy link
Member

Why disabled by default?
Can we make it smarter, so it won't depend on config value?

@marcindsobczak
Copy link
Contributor Author

It is "smart" I think. We can hardcode "MaxAttemptsToUpdatePivot": 180 or something like that, but making it disabled in the code and enable in default configs of chains like mainnet, gnosis or goerli makes sense to me, as it is not really needed on small chains which can sync fast without adding more complexity and waiting for CL message. We can make it true or false as well, but I like option of specifying number of attempts as it is giving more flexibility for more advanced users and less advanced ones would use default config anyway

@LukaszRozmej
Copy link
Member

So this basically works by delaying starting normal sync by 180 seconds at which time the CL can send us the pivot. If someone re syncs CL it will probably miss this time window. (am I right?)

What I would call smart is starting the normal sync from a hardcoded pivot and when CL gives us another pivot if it is far enough into the future that it makes sense - restarting the whole process. Make these conditions based on some good heuristic.

This way we would support situations when we get pivot like 1h into sync or something.

@kamilchodola
Copy link
Contributor

@marcindsobczak
If we will keep that disabled AND put that on configs, then any existing user which will decide to make resync (because of free space/corruption etc) will have it disabled - are we sure this is good approach?

@kamilchodola
Copy link
Contributor

@marcindsobczak
Assuming that we will start node and stop it before it will proceed to next stage (before it will catchup the head) and keep it stopped for few hours - how it will behave? Will it have info about old block or will refresh that info?

@marcindsobczak
Copy link
Contributor Author

@LukaszRozmej generally you are right, but

  1. CLs have checkpoint sync and it takes 1-2 minutes to be synced and send us pivot close to the head
  2. If CL is not synced there is not much what we can do. Our hardcoded pivots are not older than 4 weeks, downloading of newest blocks takes not more than 1h. If CL was not able to send us pivot in first ~3min, we just sync old way
  3. It might have sense to start downloading blocks after hardcoded pivot and then move pivot after receiving message from CL. There might be a problem with downloading headers/bodies from new pivot downwards, as lowest inserted would be already set close to hardcoded pivot and maybe just overwriting lowest inserted after pivot update would be enough. But how we will behave when we will be downloading it backwards and face some headers/bodies already downloaded? Need to explore it

In practice, after running Nethermind + CL with checkpoint sync, it takes not more than 1-2min to update pivot and then <1min to start SnapSync. I expect it to behave like that in majority of use-cases and then state sync would start way faster (with 3-4 weeks old pivots even 30-60min). In minority of cases when update would not be successful, state sync will just start 3min later than without this feature. What I can potentially improve is to not postpone downloading of blocks after hardcoded pivot by the time of waiting for CL message and then sync time would not be affected in any scenario. I need to check how much additional complexity this will bring

@kamilchodola

If we will keep that disabled AND put that on configs, then any existing user which will decide to make resync (because of free space/corruption etc) will have it disabled - are we sure this is good approach?

I'm not sure. Open to discuss. Advantage of having it disabled by default (even in default config files) is that it is a new feature and we might release it as beta version which must be manually triggered when running nethermind. We did it when introducing SnapSync. After few weeks and many users experimenting with that it was enabled by default in the config. But SnapSync was way more complex than this small feature so maybe we don't need that approach? Again, open to discuss

Assuming that we will start node and stop it before it will proceed to next stage (before it will catchup the head) and keep it stopped for few hours - how it will behave? Will it have info about old block or will refresh that info?

Catching the head after updating pivot take just <1min. But right, it is still possible to turn the node off during this period. Once updated pivot would not be updated again - it will keep using data from first update. So in this scenario we would just continue downloading newest blocks from the time when node was disabled, so similar as in current implementation.
So let's say new pivot is Y. Head is moving number Z
After start we are waiting for potential update. After update we start downloading blocks from Y+1 upwards. Lets say we downloaded 20 blocks so our newest block is Y+21 and state sync didn't start, because Z is at Y+200. We turned node off. Turned it on when Z is at Y+1000. We will simply continue downloading from Y+21 until we catch the Z.

@kamilchodola
Copy link
Contributor

@marcindsobczak
Question regarding scenario with CL turned off on Nethermind startup.
As I understand, now we will wait for 180 seconds to get FCU from CL and if we will nto receive that then we will start with Pivot sync.

But on PoS it does not have sense especially when Pivot is already post-merge - it will not start syncing at all.

Can we make that on PoS that it will always wait for Fcu no matter for how long? If we are not getting that, then connection can be treated as invalid.

Ofc - correct me if I'm wrong :)

@marcindsobczak
Copy link
Contributor Author

@kamilchodola you are right
We discussed it with @LukaszRozmej @MarekM25 and @benaadams and infinite timeout might be dangerous, so we decided to set it to ~15min to let users more time to setup CL and on the other hand not block downloading of old headers for longer - if it will take more than 15min, probably CL is not setup correctly, so most likely we can download old headers (and post-pivot blocks) in the meantime, until it will be fixed.
Starting download of old headers just after client start and updating pivot later is hard to implement and is not improving much, so it won't be done in this PR, maybe in the future

@marcindsobczak marcindsobczak marked this pull request as ready for review March 28, 2023 14:25
src/Nethermind/Nethermind.Merge.Plugin/MergePlugin.cs Outdated Show resolved Hide resolved
Comment on lines 16 to 17
public const int UpdatedPivotHash = 9;
public const int UpdatedPivotNumber = 10;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike that there are 2 keys, we should pack this into 1 value, having first 8bytes being number and later 32 bytes being a hash.

Or just using RLP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be really small advantage in number of writes/reads - we are writing it only once per node life and reading it only when there was a restart during sync. So performance is definitely not an issue here. And after this change there will be a lack of consistency - we will have:

        public const int TerminalPoWHash = 1;
        public const int TerminalPoWNumber = 2;
(...)
        public const int BeaconSyncPivotHash = 5;
        public const int BeaconSyncPivotNumber = 6;
(...)
        public const int UpdatedPivotData = 9;

so hashes, numbers and one combined value of number+hash.
In other values we are using separated numbers and hashes. I can change it if you want, but IMO it is better in current form.

return updateRequestedAndNotFinished &&
FastSyncEnabled &&
isPostMerge &&
stateSyncNotFinished;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we even start state sync, there is no point of updating Pivot as we already FastSynced to Head?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add logger.Trace like for other methods

Copy link
Contributor Author

@marcindsobczak marcindsobczak Mar 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need it to have correct pivot when downloading headers/bodies/receipts to avoid gaps. After restart pivot is set back to value from config. We are going to PivotUpdator, it is checking db for updated values, founding it, setting pivot back to updated value and that's all. We just have logic of reading already updated pivot from db in class PivotUpdator.
Actually I think it can be problematic if restart would be during old bodies. Then we will not read it from db and sync of old receipts would start from old pivot. I think it is a bug and I need to change stateSyncNotFinished to oldReceiptsNotFinished, as we stop carrying about pivot only when we are fully synced (not only state). Or maybe even change it to always go to PivotUpdator and read updated pivot from db? Need to think about it a bit more

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and logger added

Copy link
Contributor Author

@marcindsobczak marcindsobczak May 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we even start state sync, there is no point of updating Pivot as we already FastSynced to Head?

Right. We are trying to read already updated pivot from DB during initialization (in InitializeNetwork step). If it is successful, updateRequestedAndNotFinished = _syncConfig.MaxAttemptsToUpdatePivot > 0 check will return false and mode would not change to UpdatingPivot. If not, we want to update pivot always if we are post-merge and fast sync in enabled. No need to check state here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we need this state check - it is for already synced nodes which will be updated to version with PivotUpdator. We don't want to update pivot in that case.
Looks like it was implemented right way from the very beginning, I just forgot why was that during long period from writing it 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to sum up - right about

I think if we even start state sync, there is no point of updating Pivot as we already FastSynced to Head?

but this check is needed for backward compatibility with nodes synced before updating to Nethermind version with PivotUpdator.

In general, pivot can be updated only once in node lifetime. Then is saved to DB and loaded during initialization.


_syncConfig.PivotNumber = updatedPivotBlockNumber.ToString();
_syncConfig.PivotHash = updatedPivotBlockHash.ToString();
_syncConfig.MaxAttemptsToUpdatePivot = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not modify the config value?


private bool ShouldBeInUpdatingPivot()
{
bool updateRequestedAndNotFinished = _syncConfig.MaxAttemptsToUpdatePivot > 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not modify the config value?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add logger.Trace like for other methods


public Keccak GetFinalizedHash()
{
return _blockCacheService.FinalizedHash;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cannot go from BlockTree right? As before we sync it won't give us correct answer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FinalizedHash is needed in MultiSyncModeSelector. We could potentially pass it through SyncProgressResolver, but BeaconSync fits perfect for both, MultiSyncModeSelector and PivotUpdator (BeaconSync is needed in PivotUpdator anyway because of GetTargetBlockHeight()).

@LukaszRozmej LukaszRozmej requested a review from asdacap March 30, 2023 08:16
@asdacap
Copy link
Contributor

asdacap commented Mar 30, 2023

Have you double check other place that uses the config? StartupBlockTreeFixer, BlockTree, ReceiptFixMigration, etc and FastHeaderSyncFeed does not have same modification as BodiesSyncFeed?

# Conflicts:
#	src/Nethermind/Nethermind.Blockchain/Synchronization/SyncConfig.cs
#	src/Nethermind/Nethermind.Synchronization/ParallelSync/MultiSyncModeSelector.cs
@marcindsobczak marcindsobczak requested a review from a team as a code owner May 5, 2023 16:39
@marcindsobczak
Copy link
Contributor Author

@asdacap FastHeadersSyncFeed already has _pivotNumber = _syncConfig.PivotNumberParsed; in InitializeFeed() -> ResetPivot()

If pivot was already updated, it is read from DB in InitializeNetwork step.

StartupBlockTreeFixer is a part of ReviewBlockTree, which have InitializeNetwork in dependencies.
ReceiptFixMigration is a part of DatabaseMigrations which also have InitializeNetwork in dependencies.
In BlockTree pivots are read in time of calling methods, so we are fine as well.
I don't see a part of the code where we are not fine.

@marcindsobczak
Copy link
Contributor Author

@luk About modifying config values, I agree it is not the best practice.

But looking at the usages, if we don't want to overwrite SyncConfig, we need to create new class and pass it to:

  • BeaconPivot
  • BlockTree
  • BodiesSyncFeed
  • HeadersSyncFeed
  • MultiSyncModeSelector
  • NewPayloadHandler
  • Pivot
  • ReceiptFixMigration
  • ReceiptsSyncFeed
  • StartupBlockTreeFixer
  • SyncReport
  • SyncServer

I'm not sure what is worse here - overwriting config data or creating a lot of code and additional complexity which can be easily avoided.

@marcindsobczak marcindsobczak merged commit 2368df4 into master May 11, 2023
@marcindsobczak marcindsobczak deleted the feature/pivots_from_cl branch May 11, 2023 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove hard-coded pivot and set it basing on CL message
4 participants