Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LogDb: dont split on index bucket size #1558

Merged
merged 1 commit into from
Mar 10, 2023
Merged

Conversation

teh-cmc
Copy link
Member

@teh-cmc teh-cmc commented Mar 10, 2023

This disables index size limits when instantiating datastores in LogDb, which can result in a lot of bucket splits for literally no benefits.

The complete rationale is explained in #1545 (comment).
Here's the most important excerpt from it:

Now, when it comes to index buckets, row limits are what actually matters as they put an upper bound on the cost of sorting the bucket.

Size limits don't matter at all OTOH, since we don't even GC index buckets anymore at the moment (because of the MsgId mismatch problem, which is the exact same issue described in #1535 and is generally the root of all our issues of that nature).

The fix here should be to remove the index data size limit.

Closes #1545

@teh-cmc teh-cmc added ⛃ re_datastore affects the datastore itself 🚀 performance Optimization, memory use, etc labels Mar 10, 2023
@teh-cmc
Copy link
Member Author

teh-cmc commented Mar 10, 2023

image

@teh-cmc teh-cmc merged commit fbcf009 into main Mar 10, 2023
@teh-cmc teh-cmc deleted the cmc/fix_too_many_buckets branch March 10, 2023 14:11
// We do not garbage collect index buckets at the moment, and so the size of
// individual index buckets is irrelevant, only their total number of rows
// matter.
// See <pr-link> for details.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, i thought I pushed the commit with the link, but turns out i didnt press enter in the terminal 🤦‍♂️

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll slip it in one the upcoming GC PRs, it's fiiiiiiiine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🚀 performance Optimization, memory use, etc ⛃ re_datastore affects the datastore itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

latest_at very slow (O(N)?)
2 participants