BN: Use Bloom filter for heavy REST validators requests. #5212

cheatfate · 2023-07-25T21:34:03Z

No description provided.

github-actions · 2023-07-26T03:34:56Z

Unit Test Results

        9 files   1 077 suites 38m 6s ⏱️
  3 710 tests   3 431 ✔️ 279 💤 0 ❌
15 826 runs 15 521 ✔️ 305 💤 0 ❌

Results for commit 735566f.

♻️ This comment has been updated with latest results.

tersec · 2023-07-28T23:47:47Z

In general, "Bloom" is a person's name in this context:

A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970

So all the references to lowercase "bloom filters" are a bit suboptimal -- the proper name of this data structure is the "Bloom filter".

tersec · 2023-07-28T23:58:40Z

beacon_chain/rpc/rest_utils.nim

+proc getBitsCount*(itemsCount: int): int =
+  ## We are using 8 hashes and we want to get 0.0001 false positive rate, so we
+  ## should use `m/n == 21` according to Table 3 of
+  ## https://pages.cs.wisc.edu/~cao/papers/summary-cache/node8.html


Alternatively, given a certain number of expected items (defined elsewhere in this PR), both optimal number of hash functions and optimal m/n can be found. Not sure how worthwhile -- 8 hash functions typically gets pretty good, but figure 4 ("Probability of false positives (log scale). The top curve is for 4 hash functions. The bottom curve is for the optimum (integral) number of hash functions.") does show potentially useful improvements there.

tersec · 2023-07-29T00:03:31Z

beacon_chain/rpc/rest_utils.nim

+
+func checkKey*(bf: BloomFilter, pubkey: ValidatorPubKey): bool =
+  let hashes = pubkey.toHashes()
+  if not(bf.getBit(hashes[0])): return false


This is not better as a loop? If it has compile-time constant bounds, I'd expect suitable unrolling to be something a C compiler might do, if useful for performance, since compilers need to be able to handle this for auto-vectorization.

tersec · 2023-07-29T00:04:56Z

beacon_chain/rpc/rest_utils.nim

+
+proc registerKey*(bf: var BloomFilter, pubkey: ValidatorPubKey) =
+  let hashes = pubkey.toHashes()
+  bf.raiseBit(hashes[0])


Same comment/question about the loop.

tersec · 2023-07-29T00:09:00Z

beacon_chain/rpc/rest_beacon_api.nim

+    # running. If number of validators will exceed this value we going
+    # to lose bloom filter's efficiency only.
+    itemsCount = validatorsCount + (validatorsCount div 3)
+    bitsCount = getBitsCount(int(itemsCount))


It's slightly distasteful to have to convert to int here in any explicit way -- functionally, this is something like bitsCount = itemsCount * 21 but refracted through Nim's type system. But the int(foo) conversion is a pure artifact here, and either unnecessary to begin with, or if Nim requires it, potentially unsafe in a runtime defect way.

tersec · 2023-07-29T00:14:13Z

beacon_chain/rpc/rest_utils.nim

+  when sizeof(uint) == 8: 6'u else: 5'u
+
+template modMask(): uint =
+  when sizeof(uint) == 8: 63'u else: 31'u


divShift and modMask need to be consistent with each other -- wonder if it's better to explicitly define modMask as something similar to (1'u shl divShift()) - 1 to make this both automatic and explicit.

tersec · 2023-07-29T00:17:31Z

beacon_chain/rpc/rest_beacon_api.nim

+const
+  MINIMAL_VALIDATORS_BF_COUNT = 1_000_000
+    ## Minimal size of validators bloom filter. Current mainnet number of
+    ## validators is near 700k, so you could update this number if it exceeds.


Suggested change

## validators is near 700k, so you could update this number if it exceeds.

## validators is near 700k; this default could be increased when useful.

tersec · 2024-02-19T18:08:34Z

beacon_chain/rpc/rest_utils.nim

+template modMask(): uint =
+  when sizeof(uint) == 8: 63'u else: 31'u
+
+func raiseBit*(bf: var BloomFilter, pos: Natural) {.inline.} =


use setBit from stew/bitops2?

tersec · 2024-02-19T18:09:01Z

beacon_chain/rpc/rest_utils.nim

+  let index = uint(pos) shr divShift()
+  bf.words[index] = bf.words[index] or (1'u shl (uint(pos) and modMask()))
+
+func getBit*(bf: BloomFilter, pos: Natural): bool {.inline.} =


Use getBit from stew/bitops2?

tersec · 2024-02-19T18:12:02Z

beacon_chain/rpc/rest_utils.nim

+    words: seq[uint]
+    length: int
+    mask: uint
+    used: int


What is the utility of used? It's maintained in a few places, but aside from being able to present this as a more or less normal Nim data structure, it seems to be just overhead not important for the functioning of the Bloom filter itself

tersec · 2024-02-19T18:17:42Z

beacon_chain/rpc/rest_utils.nim

+  bf.used
+
+template divShift(): uint =
+  when sizeof(uint) == 8: 6'u else: 5'u


Semi-arbitrarily asking here: all of these disparate codepaths based on 32-bit or 64-bit uints seem to be predicated on there being enough of a performance (or memory usage? cache locality? etc) improvement based on not just picking one of byte/uint8, uint32 (native on some machines, though none the BN will run well on), or uint64 (and let 32-bit machines get suboptimal code) and using it consistently.

Debugging these issues is unfortunate, because anything 32-bit only relies on setting up a 32-bit environment first. But there doesn't seem to be a fundamental reason here not to go with uint64 or byte (each has advantages; uint32 is an awkward middle ground, optimal for nothing) as the leaf size and avoid e.g., toHashes doubling or tripling in size.

tersec · 2024-02-19T18:21:13Z

beacon_chain/rpc/rest_utils.nim

@@ -24,6 +26,15 @@ type
  ValidatorIndexError* {.pure.} = enum
    UnsupportedValue, TooHighValue

+  BloomFilter* = object
+    words: seq[uint]
+    length: int


length is used in a couple of asserts, and there's a bitsCount function which returns it. But in general, the words seq has all the length information the Bloom filter functionally needs, so like used, this is more overhead, both code and memory.

Initial commit.

735566f

cheatfate force-pushed the rest-bloom-filter branch from b149d1b to 735566f Compare July 26, 2023 20:21

tersec reviewed Jul 28, 2023

View reviewed changes

tersec reviewed Jul 29, 2023

View reviewed changes

tersec mentioned this pull request Jul 31, 2023

Revert "generalize ShufflingRef acceleration logic" #5223

Merged

cheatfate changed the title ~~BN: Use bloom filter for heavy REST validators requests.~~ BN: Use Bloom filter for heavy REST validators requests. Jul 31, 2023

tersec reviewed Feb 19, 2024

View reviewed changes

tersec mentioned this pull request Feb 27, 2024

Bloom filter acceleration for deposit processing #5982

Merged

cheatfate closed this Mar 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BN: Use Bloom filter for heavy REST validators requests. #5212

BN: Use Bloom filter for heavy REST validators requests. #5212

cheatfate commented Jul 25, 2023

github-actions bot commented Jul 26, 2023 •

edited

Loading

tersec commented Jul 28, 2023 •

edited

Loading

tersec Jul 28, 2023

tersec Jul 29, 2023 •

edited

Loading

tersec Jul 29, 2023

tersec Jul 29, 2023

tersec Jul 29, 2023

tersec Jul 29, 2023

tersec Feb 19, 2024

tersec Feb 19, 2024

tersec Feb 19, 2024 •

edited

Loading

tersec Feb 19, 2024

tersec Feb 19, 2024

	## validators is near 700k, so you could update this number if it exceeds.
	## validators is near 700k; this default could be increased when useful.

BN: Use Bloom filter for heavy REST validators requests. #5212

BN: Use Bloom filter for heavy REST validators requests. #5212

Conversation

cheatfate commented Jul 25, 2023

github-actions bot commented Jul 26, 2023 • edited Loading

Unit Test Results

tersec commented Jul 28, 2023 • edited Loading

Choose a reason for hiding this comment

tersec Jul 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tersec Feb 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jul 26, 2023 •

edited

Loading

tersec commented Jul 28, 2023 •

edited

Loading

tersec Jul 29, 2023 •

edited

Loading

tersec Feb 19, 2024 •

edited

Loading