Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't store full pubkey in pubkey2index map #3039

Closed
dapplion opened this issue Aug 30, 2021 · 6 comments
Closed

Don't store full pubkey in pubkey2index map #3039

dapplion opened this issue Aug 30, 2021 · 6 comments
Assignees
Labels
meta-investigate Issues found that require further investigation and may not have a specific resolution/fix prio-low This is nice to have. scope-memory Issues to reduce and improve memory usage.

Comments

@dapplion
Copy link
Contributor

Describe the feature

We don't need to store the full pubkey since the collision probability is low enough with just a few bytes. Note that ethereum addresses use 20 bytes only.

Check a safe minimum byte count and create a map that can extend the size of the key if it detects a collision. Keep an array of key bytes, starting at [10] for example, and if there's a collision extend to [11] and write new keys with 11 bytes. Then to check slice to 11 check, slice to 10 check.

@dapplion dapplion added the scope-memory Issues to reduce and improve memory usage. label Aug 30, 2021
@dapplion
Copy link
Contributor Author

dapplion commented Sep 7, 2021

pubkey2index map is retaining 320MB in Prater (220_000 validators) which averages 1454 bytes per pubkey 🤯

Screenshot from 2021-09-07 12-15-05

@q9f q9f added the prio-medium Resolve this some time soon (tm). label Sep 14, 2021
@philknows philknows added prio-high Resolve issues as soon as possible. and removed prio-medium Resolve this some time soon (tm). labels Nov 29, 2021
@dapplion
Copy link
Contributor Author

Proposal:

  • Use only the first N bytes of the pubkey (i.e. 4) and encoded as base64url to have a very short key. Use base64url since it has no padding and saves 2 characters
  • The Map value will be an array of indices that start with that pubkey prefix.

Since the goal of this Map is to speed up lookup by pubkeys narrowing down from 250_000 to ~10 instead of 1 is good enough.

Analysis of probabilities should pick an appropriate byte length. Also must check that BLS pubkeys have a sufficient random distribution of the first N bytes.

@dapplion
Copy link
Contributor Author

dapplion commented Jan 3, 2022

After doing some research today:

  • The bulk of the memory usage is cause by this awful concatenated strings. With Use memory efficient toHex in pubkey2index map #3561 the memory usage of the map is reduced by 10x from 350MB to 35MB.

  • With non-concatenated strings, using less bytes has marginal benefits being able to reduce 2.5x extra. However that's an absolute reduction of ~19MB for the entire application so not a lot. Also, using shorter keys adds non-trivial complexity to ensure keys are not known. To do that indexed sync committees MUST not rely of the pubkey2index map but for the first indexation.

  • Using hex, base64 or base64url has not effect of the total memory usage of the Map

WIP branch: https://github.com/ChainSafe/lodestar/tree/dapplion/pubkey2index-short-key

@dapplion dapplion added prio-low This is nice to have. and removed prio-high Resolve issues as soon as possible. labels Jan 3, 2022
@dapplion
Copy link
Contributor Author

dapplion commented Feb 4, 2022

The real cause of the memory inefficiency is the concatenated strings as shown here #3446

The current size of the Pubkey2Index map is 45.5MB for 275496 validators. We could consider reducing the pubkey length but now it's not that prioritary

Screenshot from 2022-02-04 17-33-07

@philknows
Copy link
Member

Is this a worthy optimisation for memory? Need to investigate further.

@philknows philknows added the meta-investigate Issues found that require further investigation and may not have a specific resolution/fix label Nov 5, 2023
@wemeetagain
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta-investigate Issues found that require further investigation and may not have a specific resolution/fix prio-low This is nice to have. scope-memory Issues to reduce and improve memory usage.
Projects
None yet
Development

No branches or pull requests

5 participants