Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Reduce attestation subscription spam from VC #4806

Closed

Conversation

michaelsproul
Copy link
Member

@michaelsproul michaelsproul commented Oct 5, 2023

Proposed Changes

Instead of sending every attestation subscription every slot to every BN:

  • Send subscriptions 32, 16, 8, 7, 6, 5, 4, 3 slots before they occur.
  • Track whether each subscription is sent successfully and retry it in subsequent slots if necessary.

Additional Info

@michaelsproul michaelsproul added val-client Relates to the validator client binary optimization Something to make Lighthouse run more efficiently. labels Oct 5, 2023
@michaelsproul michaelsproul changed the base branch from unstable to stable October 5, 2023 03:27
@michaelsproul michaelsproul changed the base branch from stable to unstable October 5, 2023 03:27
@michaelsproul
Copy link
Member Author

Initial data from Holesky looks good.

No more errors about subscriptions failing. Previously we had lots of:

Oct 05 04:08:48.079 ERRO Failed to subscribe validators error: Some endpoints failed, num_failed: 1 http://beacon-node:5052 => RequestFailed(ServerMessage(ErrorMessage { code: 500, message: "INTERNAL_SERVER_ERROR: unable to queue subscription, host may be overloaded or shutting down", stacktraces: [] })), service: duties

After the change was deployed this is gone:

no_more_errors

Additionally upload bandwidth from the VC is reduced by about half:

reduced_upload_bandwidth

And the time for the subscription service to run is also reduced:

faster_subscriptions

These numbers are from a Holesky VC with 10k validators, connected to a single BN.

@michaelsproul michaelsproul added the ready-for-review The code is ready for review label Oct 5, 2023
Copy link
Member

@AgeManning AgeManning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

The main reason for sending every slot was just to handle edge cases of just learning about the duties and on start up.

The decaying subscription time makes a lot more sense.

I noticed the queue you have which records when we have sent a sub can have gaps/holes indicating we haven't sent subscriptions, but this doesn't matter because your searches are short circuiting on the latest ones.

So lgtm.

let num_expected_subscriptions = std::cmp::max(
1,
5 * local_pubkeys.len() * ATTESTATION_SUBSCRIPTION_OFFSETS.len()
/ (4 * E::slots_per_epoch() as usize),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really that important, but I didn't follow this logic. Why 5x keys? Also the division by 4?
Naively i would expect 50% of keys to do ATTESTATION_SUBSCRIPTION_OFFSET.len() -1 and they lie in one half of the epoch.

It doesn't really matter tho, we're just trying to save some allocation, the larger the better, probably

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 5/4 is so the vec is 1/4 larger than it should be.

Other than the 5/4 the bound is quite tight because we are looking at 2 epochs total (current and next), there are 2 * local_pubkeys.len() attestation duties, and each duty gets sent at ATTESTATION_SUBSCRIPTION_OFFSETS.len() slots. So there are 2 * local_pubkeys.len() * ATTESTATION_SUBSCRIPTION_OFFSETS.len() subscription-slots to be distributed. They're approximately distributed over 2 epochs (current and next) but with a slight offset from the epoch start, e.g. some of the subscription-slots for the current epoch were in the previous epoch, but this is balanced out by some of the subscription-slots for the N + 2 epoch being in the next epoch (we just don't know it yet). Therefore the total number of subscription-slots per slot is: (2 * local_pubkeys.len() * ATTESTATION_SUBSCRIPTION_OFFSETS.len()) / (2 * E::slots_per_epoch()), and we can cancel the factor of 2.

Experimentally the number of subscriptions per slot is very close to this. For 10k validators, we'd expect 2.5k subscriptions without the 5/4 factor, and in practice we're very close to this:

DEBG Sent attestation subscriptions expected: 3124, count: 2512
DEBG Sent attestation subscriptions expected: 3124, count: 2560
DEBG Sent attestation subscriptions expected: 3124, count: 2535
DEBG Sent attestation subscriptions expected: 3124, count: 2483

@michaelsproul
Copy link
Member Author

LFG

bors r+

@michaelsproul michaelsproul added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Oct 6, 2023
bors bot pushed a commit that referenced this pull request Oct 6, 2023
## Proposed Changes

Instead of sending every attestation subscription every slot to every BN:

- Send subscriptions 32, 16, 8, 7, 6, 5, 4, 3 slots before they occur.
- Track whether each subscription is sent successfully and retry it in subsequent slots if necessary.

## Additional Info

- [x] Add unit tests for `SubscriptionSlots`.
- [x] Test on Holesky.
- [x] Based on #4774 for testing.
@bors
Copy link

bors bot commented Oct 6, 2023

Build failed:

@michaelsproul
Copy link
Member Author

bors retry

bors bot pushed a commit that referenced this pull request Oct 6, 2023
## Proposed Changes

Instead of sending every attestation subscription every slot to every BN:

- Send subscriptions 32, 16, 8, 7, 6, 5, 4, 3 slots before they occur.
- Track whether each subscription is sent successfully and retry it in subsequent slots if necessary.

## Additional Info

- [x] Add unit tests for `SubscriptionSlots`.
- [x] Test on Holesky.
- [x] Based on #4774 for testing.
@bors
Copy link

bors bot commented Oct 6, 2023

Build failed:

@michaelsproul
Copy link
Member Author

bors retry

bors bot pushed a commit that referenced this pull request Oct 6, 2023
## Proposed Changes

Instead of sending every attestation subscription every slot to every BN:

- Send subscriptions 32, 16, 8, 7, 6, 5, 4, 3 slots before they occur.
- Track whether each subscription is sent successfully and retry it in subsequent slots if necessary.

## Additional Info

- [x] Add unit tests for `SubscriptionSlots`.
- [x] Test on Holesky.
- [x] Based on #4774 for testing.
@bors
Copy link

bors bot commented Oct 6, 2023

Pull request successfully merged into unstable.

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

@bors bors bot changed the title Reduce attestation subscription spam from VC [Merged by Bors] - Reduce attestation subscription spam from VC Oct 6, 2023
@bors bors bot closed this Oct 6, 2023
@michaelsproul michaelsproul deleted the reduce-subscription-spam branch October 6, 2023 08:01
Woodpile37 pushed a commit to Woodpile37/lighthouse that referenced this pull request Jan 6, 2024
## Proposed Changes

Instead of sending every attestation subscription every slot to every BN:

- Send subscriptions 32, 16, 8, 7, 6, 5, 4, 3 slots before they occur.
- Track whether each subscription is sent successfully and retry it in subsequent slots if necessary.

## Additional Info

- [x] Add unit tests for `SubscriptionSlots`.
- [x] Test on Holesky.
- [x] Based on sigp#4774 for testing.
Woodpile37 pushed a commit to Woodpile37/lighthouse that referenced this pull request Jan 6, 2024
## Proposed Changes

Instead of sending every attestation subscription every slot to every BN:

- Send subscriptions 32, 16, 8, 7, 6, 5, 4, 3 slots before they occur.
- Track whether each subscription is sent successfully and retry it in subsequent slots if necessary.

## Additional Info

- [x] Add unit tests for `SubscriptionSlots`.
- [x] Test on Holesky.
- [x] Based on sigp#4774 for testing.
Woodpile37 pushed a commit to Woodpile37/lighthouse that referenced this pull request Jan 6, 2024
## Proposed Changes

Instead of sending every attestation subscription every slot to every BN:

- Send subscriptions 32, 16, 8, 7, 6, 5, 4, 3 slots before they occur.
- Track whether each subscription is sent successfully and retry it in subsequent slots if necessary.

## Additional Info

- [x] Add unit tests for `SubscriptionSlots`.
- [x] Test on Holesky.
- [x] Based on sigp#4774 for testing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimization Something to make Lighthouse run more efficiently. ready-for-merge This PR is ready to merge. val-client Relates to the validator client binary
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants