Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade in-memory and on-disk state representation with tree states #3206

Open
wants to merge 84 commits into
base: unstable
Choose a base branch
from

Conversation

michaelsproul
Copy link
Member

@michaelsproul michaelsproul commented May 23, 2022

This is an umbrella PR for fundamental changes to Lighthouse's state representation.

The plan is to merge smaller PRs into this one, and then merge it to unstable without squashing.

TODO List

There are few things that need fixing on this branch before it's ready for primetime:

  • Remove recursion from get_state. The recursion is liable to overflow the stack, so we should switch to an iterative algorithm.
  • Fix the validator monitor
  • Clean up the state processing optimisations
  • Fuzz the state processing optimisations
  • Add or fix state processing optimisations for phase0 so that block replay isn't too slow
  • Add new metrics for jemalloc
  • Resolve all remaining FIXME(sproul)s in the code
  • Re-do the optimisation that was reverted in [Merged by Bors] - Revert "Optimise HTTP validator lookups" #3658

Issues Addressed

Closes #2806
Closes #2805
Closes #2944
Closes #2371
Closes #3208
Closes #3505

@michaelsproul michaelsproul added do-not-merge work-in-progress PR is a work-in-progress backwards-incompat Backwards-incompatible API change labels May 23, 2022
bors bot pushed a commit that referenced this pull request Nov 11, 2022
## Issue Addressed

Part of #3651.

## Proposed Changes

Add a flag for enabling the light client server, which should be checked before gossip/RPC traffic is processed (e.g. #3693, #3711). The flag is available at runtime from `beacon_chain.config.enable_light_client_server`.

Additionally, a new method `BeaconChain::with_mutable_state_for_block` is added which I envisage being used for computing light client updates. Unfortunately its performance will be quite poor on average because it will only run quickly with access to the tree hash cache. Each slot the tree hash cache is only available for a brief window of time between the head block being processed and the state advance at 9s in the slot. When the state advance happens the cache is moved and mutated to get ready for the next slot, which makes it no longer useful for merkle proofs related to the head block. Rather than spend more time trying to optimise this I think we should continue prototyping with this code, and I'll make sure `tree-states` is ready to ship before we enable the light client server in prod (cf. #3206).

## Additional Info

I also fixed a bug in the implementation of `BeaconState::compute_merkle_proof` whereby the tree hash cache was moved with `.take()` but never put back with `.restore()`.
macladson pushed a commit to macladson/lighthouse that referenced this pull request Jan 5, 2023
## Issue Addressed

Part of sigp#3651.

## Proposed Changes

Add a flag for enabling the light client server, which should be checked before gossip/RPC traffic is processed (e.g. sigp#3693, sigp#3711). The flag is available at runtime from `beacon_chain.config.enable_light_client_server`.

Additionally, a new method `BeaconChain::with_mutable_state_for_block` is added which I envisage being used for computing light client updates. Unfortunately its performance will be quite poor on average because it will only run quickly with access to the tree hash cache. Each slot the tree hash cache is only available for a brief window of time between the head block being processed and the state advance at 9s in the slot. When the state advance happens the cache is moved and mutated to get ready for the next slot, which makes it no longer useful for merkle proofs related to the head block. Rather than spend more time trying to optimise this I think we should continue prototyping with this code, and I'll make sure `tree-states` is ready to ship before we enable the light client server in prod (cf. sigp#3206).

## Additional Info

I also fixed a bug in the implementation of `BeaconState::compute_merkle_proof` whereby the tree hash cache was moved with `.take()` but never put back with `.restore()`.
@michaelsproul michaelsproul added the tree-states Upcoming state and database overhaul label Jun 19, 2023
paulhauner and others added 13 commits June 20, 2023 11:47
* Address some clippy lints

* Box errors to fix error size lint

* Add Default impl for Validator

* Address more clippy lints

* Re-implement `check_state_diff`

* Fix misc test compile errors
* Use "rebasing" to minimise BeaconState mem usage

* Update metastruct

* Use upstream milhouse, update cargo lock

* Rebase caches for extra memory savings
* Get tests passing

* Get benchmarks compiling

* Fix EF withdrawals test

* Remove unused deps

* Fix tree_hash panic in tests

* Fix slasher compilation

* Fix ssz_generic test

* Get more tests passing

* Fix EF tests for real

* Fix local testnet scripts
* Fix db-migration-period default

* Fix version regex
* Relocate epoch cache to BeaconState

* Optimize per block processing by pulling previous epoch & current epoch calculation up.

* Revert `get_cow` change (no performance improvement)

* Initialize `EpochCache` in epoch processing and load it from state when getting base rewards.

* Initialize `EpochCache` at start of block processing if required.

* Initialize `EpochCache` in `transition_blocks` if `exclude_cache_builds` is enabled

* Fix epoch cache initialization logic

* Remove FIXME comment.

* Cache previous & current epochs in `consensus_context.rs`.

* Move `get_base_rewards` from `ConsensusContext` to `BeaconState`.

* Update Milhouse version
database_manager/src/lib.rs Outdated Show resolved Hide resolved
database_manager/src/lib.rs Outdated Show resolved Hide resolved
@@ -377,131 +378,3 @@ fn slot_of_prev_restore_point<E: EthSpec>(current_slot: Slot) -> Slot {
let slots_per_historical_root = E::SlotsPerHistoricalRoot::to_u64();
(current_slot - 1) / slots_per_historical_root * slots_per_historical_root
}

#[cfg(test)]
mod test {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional delete? BlockRootsIterator still exists

@@ -26,6 +26,7 @@ impl<E: EthSpec> LevelDB<E> {
let mut options = Options::new();

options.create_if_missing = true;
options.write_buffer_size = Some(512 * 1024 * 1024);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment justifying the increased value

let epoch_participation = state.get_epoch_participation_mut(
data.target.epoch,
ctxt.previous_epoch,
ctxt.current_epoch,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this optimization make a difference? Introduced in 2df714e

.map_or(false, VList::has_pending_updates)
|| self
.inactivity_scores()
.map_or(false, VList::has_pending_updates)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgetting to add a field in a new fork here may cause either a panic or data corruption. This function could be generated by a macro?

base_inner,
|_, self_field, base_field| { self_field.rebase_on(base_field) }
);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not using Deneb's bimap_beacon_state_deneb_tree_list_fields

@dapplion dapplion mentioned this pull request Feb 7, 2024
4 tasks
* Delete unused epoch processing code

* Compare total deltas

* Remove unnecessary apply_pending

* cargo fmt

* Remove newline
"Updating historic state config";
"previous_config" => ?disk_config.hierarchy_config,
"new_config" => ?db.config.hierarchy_config,
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelsproul I recall you commenting changing the hierarchy_config is not possible without a re-sync. Should this comment state so?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well if you see this comment then it has already worked, so not sure if it's helpful at this stage.

You can change it by running prune-states and then restarting with the new config. I think it probably makes sense to just have that in the docs.

michaelsproul and others added 5 commits February 14, 2024 11:55
* Fix beta compiler lints

* Fix fork choice tests

* Fix op_pool tests

---------

Co-authored-by: dapplion <35266934+dapplion@users.noreply.github.com>
* Fix CLI help text

* Tree states release v5.0.111-exp
@michaelsproul michaelsproul mentioned this pull request Apr 8, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backwards-incompat Backwards-incompatible API change do-not-merge tree-states Upcoming state and database overhaul work-in-progress PR is a work-in-progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants