Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream server side filtering #8207

Merged
merged 9 commits into from
Jul 10, 2023
Merged

Stream server side filtering #8207

merged 9 commits into from
Jul 10, 2023

Conversation

kjnilsson
Copy link
Contributor

@kjnilsson kjnilsson commented May 16, 2023

This PR implements an approach to server side consumer filtering of streams to allow consumers that are only interested in a subset of the data in a given stream to reduce the amount of unwanted data that is sent to the client application.

The filtering is done at the chunk level and is probabilistic. False positives are possible but false negatives are not. I.e. a consumer would still need to do client side filtering in addition to the broker side filtering. This can be a client feature or done per application.

The broker will never interpret or filter on individual messages. It can only be done at the chunk level.

To do the filtering the stream will for each chunk (batch of messages) written calculate a bloom filter over all the messages that include a filter value. This bloom filter data will be written immediately after the header and if present and if the consumer includes a filter value it will calculate a bloom filter hash for the consumer filter and match this against the chunk filter and only deliver the chunk to the client if it is a match.

The per chunk bloom filter size can be varied between 16 and 255 bytes depending on the expected number of overlapping filter values a stream will contain. The osiris default is 16 bytes (the smallest size).

A 16 byte bloom filter can filter around 50 different filter values with a false positive probability of ~30%.
A 255 byte bloom filter can filter around 800 values at the same false positive probability.

It is worth noting that a chunk without any messages that included a filter value will not be delivered to a consumer that includes a filter. Hence a consumer that is interested in any messages that may not contain a filter value needs to also not include a filter value. This is an important consideration for existing stream after an upgrade. There is no way to retro fit a filter value onto existing data.

  • Osiris API and implementation draft
  • Osiris finalisation
  • Stream bloom filter size configuration
  • Stream protocol consumer API
  • Stream protocol publisher API
  • AMQP Publisher API
    - Perhaps using a custom header x-stream-filter-value
  • (AMQP consumer API)
    - Perhaps using a custom consumer arg x-stream-filter
  • Feature flag

@kjnilsson kjnilsson changed the title osiris branch update Stream server side filtering May 16, 2023
@acogoluegnes acogoluegnes force-pushed the stream-chunk-filtering branch 2 times, most recently from ff4fbc2 to c1e1353 Compare May 25, 2023 08:01
@acogoluegnes acogoluegnes force-pushed the stream-chunk-filtering branch from c1e1353 to 5c6cb07 Compare June 7, 2023 11:40
@kjnilsson kjnilsson added this to the 3.13.0 milestone Jun 7, 2023
@acogoluegnes acogoluegnes force-pushed the stream-chunk-filtering branch 2 times, most recently from f8ecc65 to a1f1371 Compare June 19, 2023 12:28
@acogoluegnes acogoluegnes force-pushed the stream-chunk-filtering branch 2 times, most recently from 7ab35ce to f8dc1ed Compare June 20, 2023 13:56
@acogoluegnes acogoluegnes force-pushed the stream-chunk-filtering branch from c62b0d9 to b89976a Compare July 10, 2023 13:22
@acogoluegnes acogoluegnes marked this pull request as ready for review July 10, 2023 14:00
@acogoluegnes acogoluegnes merged commit a9eb8b5 into main Jul 10, 2023
@acogoluegnes acogoluegnes deleted the stream-chunk-filtering branch July 10, 2023 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants