Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce_runs may execute too long time #5030

Closed
MichaelScofield opened this issue Nov 20, 2024 · 6 comments
Closed

reduce_runs may execute too long time #5030

MichaelScofield opened this issue Nov 20, 2024 · 6 comments
Labels
C-bug Category Bugs

Comments

@MichaelScofield
Copy link
Collaborator

MichaelScofield commented Nov 20, 2024

What type of bug is this?

Performance issue

What subsystems are affected?

Datanode

Minimal reproduce step

The method reduce_runs (in

pub(crate) fn reduce_runs<T: Item>(runs: Vec<SortedRun<T>>, target: usize) -> Vec<Vec<T>> {
) iterates over the combinations of runs: Vec<SortedRun<T>> on let k = runs.len() + 1 - target;. The time complexity of calculating combinations is C(n, k), which could be very time consuming to run. For example, C(10, 4) = 330791175. Upon iterating this large combinations, the codes may execute too long time.

I've observed this situation in one of our customers. He has too many sst files that are time ranges overlapped. The runs passed to reduce_runs are large. See this log (not in our codes but I temporarily added for debugging) right before executing reduce_runs:

mito2::compaction::twcs region: 15964393439232(3717, 0), found_runs: 2883, max_runs: 4, max_files: 4

So the combinations in his sorted_runs are C(2883, 5) = 1653997252262256, which is too big to iterate. The code's execution acts like hanging forever.

What did you expect to see?

code execution proceeded

What did you see instead?

code execution hang

What operating system did you use?

ubuntu

What version of GreptimeDB did you use?

main

Relevant log output and stack trace

No response

@MichaelScofield MichaelScofield added the C-bug Category Bugs label Nov 20, 2024
@killme2008
Copy link
Contributor

Where is reduce_runs? I think it would be better to add the code link in issue.

@MichaelScofield
Copy link
Collaborator Author

MichaelScofield commented Nov 20, 2024

Right now I do a quick and dirty walkaround by taking the first 1024 of combinations, like this:

...
runs.into_iter()
    .combinations(k)

    .take(1024) // here

    .map(|runs_to_merge| merge_all_runs(runs_to_merge))
...

It may produce not minimal penalty result, but the code execution can proceed. I can polish it a bit, but WDYT about the idea? @evenyag @v0y4g3r

@v0y4g3r
Copy link
Contributor

v0y4g3r commented Nov 21, 2024

Normally the combination number won't be that large, C(10,4) is 210. The problem is why there're 2883 runs. can your provide the time ranges of all files?

@MichaelScofield
Copy link
Collaborator Author

@v0y4g3r The ssts were ingested directly, from early days before this reduce_runs feature was added to our codes.

@v0y4g3r
Copy link
Contributor

v0y4g3r commented Nov 21, 2024

@v0y4g3r The ssts were ingested directly, from early days before this reduce_runs feature was added to our codes.

Looks like ingesting large number of files with overlapping time ranges is not the original use case of any existing compaction strategies that expect overlappping files gradually get merged into one or serveral files as they age. We should come up with a simpler strategy that always merge them into one if theire size does not exceed some threshold.

@MichaelScofield
Copy link
Collaborator Author

Only early greptimedb that don't do compaction after ingesting ssts are affected, not a big issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category Bugs
Projects
None yet
Development

No branches or pull requests

3 participants