Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re_datastore: store component rows larger than a threshold behind an indirection #487

Closed
2 tasks
teh-cmc opened this issue Dec 7, 2022 · 0 comments · Fixed by #1727
Closed
2 tasks

re_datastore: store component rows larger than a threshold behind an indirection #487

teh-cmc opened this issue Dec 7, 2022 · 0 comments · Fixed by #1727
Labels
⛃ re_datastore affects the datastore itself

Comments

@teh-cmc
Copy link
Member

teh-cmc commented Dec 7, 2022

Let's say you have an arbitrarily large component (e.g. an Image, whose contents we have no control over).
As the contents of the Image gets larger and larger, the amount of rows we can fit in a single bucket gets smaller and smaller.
In the worst case, you get all of the overhead of bucketing, for none of the benefits.

Not only that, but it might also be the case that the contents of the Image are actually identical e.g. every other frame.
So not only we're wasting bucketing overhead, we're wasting space too.

In both situations, hashing the contents of the image if it gets beyond a certain threshold, and storing that hash instead of the actual contents, fixes the issue.
Of course, we need to store an extra map to resolve the hashes into actual data at query time.

TODO:

  • implement
  • test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⛃ re_datastore affects the datastore itself
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant