Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] Curious case of state cache #4502

Closed
vgorkavenko opened this issue Jul 13, 2023 · 4 comments
Closed

[QUESTION] Curious case of state cache #4502

vgorkavenko opened this issue Jul 13, 2023 · 4 comments

Comments

@vgorkavenko
Copy link

Description

We would like to clarify behaviour of state cache from this PR

Please, take a look at this case

     unfinalized              reorg event       finalized
     epoch 100000                  v           epoch 100000
----------|------------------------|----------------|-------------> time
          
          ^                                         ^
  state is requested                          state is requested
response cached by slot number        response is equal to cached (0x123...00)
 with state data 0x123...00          but should be 0x123...01 after finalization

Is it possible? Even if we request a state by hash

@michaelsproul
Copy link
Member

Shouldn't be possible. That cache only holds states from the freezer DB which are finalized and can't be reorged. Did you see this in the wild?

@vgorkavenko
Copy link
Author

But it depends on slots_per_restore_point, am I right?

We received inconsistent behavior of finalized state on multiple hosts (4 hosts responded with the same data, and only one of them got it wrong) and are trying to figure out the cause. As soon as we get the details, I'll share them.

@michaelsproul
Copy link
Member

But it depends on slots_per_restore_point, am I right?

Yeah, the layout of states on disk depends on slots-per-restore-point. The restore points will determine how many blocks get replayed.

We received inconsistent behavior of finalized state on multiple hosts (4 hosts responded with the same data, and only one of them got it wrong)

Ah, that sounds like it might be this bug: #3011. We've looked long and hard for the root cause of that bug without finding anything, and to be honest we'll probably never find it. We are in the process of overhauling our database and replacing it with something better, and have an alpha of that here: https://github.com/sigp/lighthouse/releases/tag/v4.2.990-exp. Even though it's experimental, it currently has less known bugs than stable due to #3011, so it might be worth adding to your infra.

@michaelsproul
Copy link
Member

Closing as stale, and soon-to-be-resolved by tree-states

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants