-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stream server side filtering #8207
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
acogoluegnes
force-pushed
the
stream-chunk-filtering
branch
2 times, most recently
from
May 25, 2023 08:01
ff4fbc2
to
c1e1353
Compare
acogoluegnes
force-pushed
the
stream-chunk-filtering
branch
from
June 7, 2023 11:40
c1e1353
to
5c6cb07
Compare
acogoluegnes
force-pushed
the
stream-chunk-filtering
branch
2 times, most recently
from
June 19, 2023 12:28
f8ecc65
to
a1f1371
Compare
acogoluegnes
force-pushed
the
stream-chunk-filtering
branch
2 times, most recently
from
June 20, 2023 13:56
7ab35ce
to
f8dc1ed
Compare
To set filter values in stream protocol.
as a queue arg and policy
The feature should not be used during an upgrade, because it must be enabled on all nodes. The test will always fail.
acogoluegnes
force-pushed
the
stream-chunk-filtering
branch
from
July 10, 2023 13:22
c62b0d9
to
b89976a
Compare
michaelklishin
added a commit
that referenced
this pull request
Jul 22, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR implements an approach to server side consumer filtering of streams to allow consumers that are only interested in a subset of the data in a given stream to reduce the amount of unwanted data that is sent to the client application.
The filtering is done at the chunk level and is probabilistic. False positives are possible but false negatives are not. I.e. a consumer would still need to do client side filtering in addition to the broker side filtering. This can be a client feature or done per application.
The broker will never interpret or filter on individual messages. It can only be done at the chunk level.
To do the filtering the stream will for each chunk (batch of messages) written calculate a bloom filter over all the messages that include a filter value. This bloom filter data will be written immediately after the header and if present and if the consumer includes a filter value it will calculate a bloom filter hash for the consumer filter and match this against the chunk filter and only deliver the chunk to the client if it is a match.
The per chunk bloom filter size can be varied between 16 and 255 bytes depending on the expected number of overlapping filter values a stream will contain. The osiris default is 16 bytes (the smallest size).
A 16 byte bloom filter can filter around 50 different filter values with a false positive probability of ~30%.
A 255 byte bloom filter can filter around 800 values at the same false positive probability.
It is worth noting that a chunk without any messages that included a filter value will not be delivered to a consumer that includes a filter. Hence a consumer that is interested in any messages that may not contain a filter value needs to also not include a filter value. This is an important consideration for existing stream after an upgrade. There is no way to retro fit a filter value onto existing data.
- Perhaps using a custom header
x-stream-filter-value
- Perhaps using a custom consumer arg
x-stream-filter