Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Reduce bandwidth over the VC<>BN API using dependant roots #4170

Closed
wants to merge 11 commits into from

Conversation

jimmygchen
Copy link
Member

Issue Addressed

#4157

Proposed Changes

See description in #4157.

In diagram form:

reduce-attestation-bandwidth

@jimmygchen jimmygchen self-assigned this Apr 6, 2023
@jimmygchen jimmygchen added the optimization Something to make Lighthouse run more efficiently. label Apr 6, 2023
@jimmygchen jimmygchen marked this pull request as ready for review April 6, 2023 05:29
@jimmygchen jimmygchen added the work-in-progress PR is a work-in-progress label Apr 6, 2023
@jimmygchen
Copy link
Member Author

I've pushed a first implementation, seems to work fine on a local testnet between 4 Lighthouse nodes. I'm going to do a bit more testing on this, and verify the bandwidth improvements.

@jimmygchen
Copy link
Member Author

jimmygchen commented Apr 6, 2023

Some initial test numbers below from running local testnet, seems to be working fine and chain is finalizing.

Test Setup

  • local testnet, 4 beacon nodes
  • 5000 validators split across 4 validator clients
  • Record for 1 minute
  • All POST requests to /eth/v1/validator/duties/attester/{epoch}
version number of requests total req/res size (kb) initialization request size(kb) subsequent request size (kb)
unstable 160 51,222 ~320 ~320
optimized 160 1,398 ~320 ~0.75

@jimmygchen jimmygchen added ready-for-review The code is ready for review and removed work-in-progress PR is a work-in-progress labels Apr 6, 2023
@jimmygchen
Copy link
Member Author

Looking pretty good so far on a 5000 validator VC (see the drop below before vs after the deployment):

image

Attestations looks fine, will keep monitoring.

@jimmygchen
Copy link
Member Author

jimmygchen commented Apr 26, 2023

Short update: I've been running some tests on this branch, and the VC seems to be querying attester duties correctly (on a local testnet) when the dependent_root is updated.

Test Setup

It is very difficult to verify this on the Goerli testnet because re-orgs that change the dependent_root slot are very rare, since intentional late block re-orgs are not allowed at epoch boundaries (spec). I wasn't able to find one looking at data between 14 April - 21 April, where I had the updated validator client deployed on Goerli.

However it is possible to verify this on a local testnet using @michaelsproul's suggestion:

we could probably make Lighthouse produce late blocks in slot 31 and re-org them at slot 0 if that would be useful

This test branch contains code that achieves the above. This would allow verifying the attestations work as expected when dependent_root is updated. I was able to observe the VC requesting a full update from BN when the dependent_root slot got re-org'd.

Additional Info

Assume the current epoch is N:

  • For duties in current epoch N to change, the proposer would have to re-org the last slot in epoch N-2 (reorg distance > 32)
  • For duties in next epoch N+1 to change, the last slot in previous epoch N-1 would have to be re-org'd, which means a reorg distance of at least 1 (if we're on slot 0 of current epoch).

@jimmygchen
Copy link
Member Author

jimmygchen commented Apr 27, 2023

In addition to the above re-org scenario, I've also observed two other scenarios that changes depedent_root, and both times the attester duties were updated correctly

  1. if slot 31 arrives late, and block root of slot 30 would be used as dependent root updated once slot 31 is imported
  2. if beacon node goes out of sync due to network issues, then latest block in the epoch would be used as dependent root until BN is sync'd up again.

More details on this HackMD page.

@paulhauner paulhauner added the v4.2.0 Q2 2023 label Apr 27, 2023
Copy link
Member

@paulhauner paulhauner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm impressed at how small this change ended up, nice work!

I have a few suggestions but nothing major!

// request for extra data unless necessary in order to save on network bandwidth.
let uninitialized_validators =
get_uninitialized_validators(duties_service, &epoch, local_pubkeys);
let indices_to_request = if uninitialized_validators.len() >= INITIAL_DUTIES_QUERY_SIZE {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the (hypothetical) case where INITIAL_DUTIES_QUERY_SIZE == 2 and uninitialized_validators == 1 I think we could fail to initialize that one uninitialized validator in the first call.

To avoid this, I think we'd want something like:

if !uninitialized_validators.is_empty() {
  uninitalized_validators
} else {
  &local_indices[0..min(INITIAL_DUTIES_QUERY_SIZE, local_indices.len())]
}

My suggestion removes the guarantee that we'll always request INITIAL_DUTIES_QUERY_SIZE validators, but it does ensure that we always query for all uninitialized validators. I think it's probably OK to sometimes query for less than INITIAL_DUTIES_QUERY_SIZE, especially since we current have it set to 1.

validator_client/src/duties_service.rs Outdated Show resolved Hide resolved
validator_client/src/duties_service.rs Outdated Show resolved Hide resolved
);

if duties_service.per_validator_metrics() {
update_per_validator_duty_metrics::<E>(current_slot, &new_duties);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we were running this function for all duties and now we're only running it for new duties. I think that might pose a problem where we don't update the metrics to show the next epoch duties, once that next epoch arrives (assuming they haven't changed since the previous epoch).

I think we could solve this by iterating over duties_service.attesters instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, didn't think about this scenario! Will push a fix.

@paulhauner paulhauner added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels May 2, 2023
@jimmygchen jimmygchen added ready-for-review The code is ready for review and removed waiting-on-author The reviewer has suggested changes and awaits thier implementation. labels May 15, 2023
Copy link
Member

@paulhauner paulhauner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! We're almost there! Just a minor comment request and also something I missed earlier about updating the attestation slot metrics 🙏

validator_client/src/duties_service.rs Outdated Show resolved Hide resolved

if validators_to_update.is_empty() {
// No validators have conflicting (epoch, dependent_root) values or missing duties for the epoch.
return Ok(());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to make sure we run update_per_validator_duty_metrics so that we start reporting the next-epoch duties as the current_slot progresses past the current-epoch slot.

Since we've changed update_per_validator_duty_metrics, perhaps it makes sense to hoist it up into poll_beacon_attesters (perhaps after each call to poll_beacon_attesters_for_epoch in here)? That way we can exit this whenever we like and still be confident that update_per_validator_duty_metrics is being called.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch 🙏 , I've moved this to where you suggested!

@jimmygchen jimmygchen added work-in-progress PR is a work-in-progress and removed ready-for-review The code is ready for review labels May 15, 2023
@jimmygchen jimmygchen added ready-for-review The code is ready for review and removed work-in-progress PR is a work-in-progress labels May 15, 2023
Copy link
Member

@paulhauner paulhauner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, approved!

@paulhauner paulhauner added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels May 15, 2023
@paulhauner
Copy link
Member

bors r+

bors bot pushed a commit that referenced this pull request May 15, 2023
## Issue Addressed

#4157 

## Proposed Changes

See description in #4157.

In diagram form:

![reduce-attestation-bandwidth](https://user-images.githubusercontent.com/742762/230277084-f97301c1-0c5d-4fb3-92f9-91f99e4dc7d4.png)


Co-authored-by: Jimmy Chen <jimmy@sigmaprime.io>
@bors
Copy link

bors bot commented May 15, 2023

Pull request successfully merged into unstable.

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

@bors bors bot changed the title Reduce bandwidth over the VC<>BN API using dependant roots [Merged by Bors] - Reduce bandwidth over the VC<>BN API using dependant roots May 15, 2023
@bors bors bot closed this May 15, 2023
ghost pushed a commit to oone-world/lighthouse that referenced this pull request Jul 13, 2023
## Issue Addressed

sigp#4157 

## Proposed Changes

See description in sigp#4157.

In diagram form:

![reduce-attestation-bandwidth](https://user-images.githubusercontent.com/742762/230277084-f97301c1-0c5d-4fb3-92f9-91f99e4dc7d4.png)


Co-authored-by: Jimmy Chen <jimmy@sigmaprime.io>
Woodpile37 pushed a commit to Woodpile37/lighthouse that referenced this pull request Jan 6, 2024
## Issue Addressed

sigp#4157 

## Proposed Changes

See description in sigp#4157.

In diagram form:

![reduce-attestation-bandwidth](https://user-images.githubusercontent.com/742762/230277084-f97301c1-0c5d-4fb3-92f9-91f99e4dc7d4.png)


Co-authored-by: Jimmy Chen <jimmy@sigmaprime.io>
Woodpile37 pushed a commit to Woodpile37/lighthouse that referenced this pull request Jan 6, 2024
## Issue Addressed

sigp#4157 

## Proposed Changes

See description in sigp#4157.

In diagram form:

![reduce-attestation-bandwidth](https://user-images.githubusercontent.com/742762/230277084-f97301c1-0c5d-4fb3-92f9-91f99e4dc7d4.png)


Co-authored-by: Jimmy Chen <jimmy@sigmaprime.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimization Something to make Lighthouse run more efficiently. ready-for-merge This PR is ready to merge. v4.2.0 Q2 2023
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants