-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compact: Fixed issue with no-compact-mark.json being ignored #6229
base: main
Are you sure you want to change the base?
Conversation
Possible fix for #5603 |
Thanks @xBazilio. Can you please elaborate more, how this fixed that issue? |
Sometimes blocks are overlapping. The reason doesn't matter now. For compaction not to halt we mark those block with no-compact-mark.json But this mark is being ignored by compactor and compactor keeps halting. I described how to reproduce the problem in comment to #5603 There is a method When planner is created it reseived a method to exclude marked blocks. However there, in the beginning of |
I'll look into it more, because it's not very clear what |
99cd0fe
to
4f5ab9b
Compare
Some comments to what I did According to #3409, blocks marked with no-compaction-mark.json should not be compacted. But in fact they still meddle in the proccess and even cause compaction to halt. I have added checks to Groupper and ApplyRetention so that they skip marked blocks. I had to add one more test block to compact_e2e_test.go, without it there wasn't enough blocks to compact with excluded marked blocks. |
70fa331
to
7a91e9a
Compare
@yeya24 hello! Would you be so kind as to look into my PR again, please, or tag anyone who can? I struggled with e2e tests, fixed one that directly releated to compactor, but there are other failing tests. |
bd019a2
to
b771377
Compare
All tests passed 🥹 |
cmd/thanos/compact.go
Outdated
sy.Metas(), | ||
retentionByResolution, | ||
compactMetrics.blocksMarked.WithLabelValues(metadata.DeletionMarkFilename, ""), | ||
noCompactMarkerFilter); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the grouper change but why we should ignore no compact blocks in here? We still want to delete old blocks even if they have that mark, right? Otherwise, the block storage usage will grow inexorably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is: overlap is somthing out of ordinary. One block can be small enough that it will never be downsampled. Thus, if we delete it with retention policy, we would get a gap in metrics.
I suppose that we won't be marking blocks in huge numbers, so storage usage shouldn't grow uncontrollably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But no-compact-mark.json
can appear even without the user knowing: https://github.com/thanos-io/thanos/blob/main/pkg/compact/planner.go#L282-L287. I have some blocks with no-compact-mark.json
too due to this. So, with your change they would never get deleted because they would be filtered out by the retention apply function, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Just today I stumbled on this case.
It's tricky. Maybe there should be a flag to control wether to prevent retention policy from deleting marked blocks.
Maybe retention policy alhorithm should check if there is a block with lower resolution before deleting block. We prefer to store 1h resolutoin forever, while having 0s for 12 days and 5m for 60 days. Also we had compaction halted for several months, before we noticed. If we would allow retention to remove blocks it would just delete all raw data that wasn't compacted due to halting. We solved it for now by turning off retention policy.
What do you think? Can I make a flag, or should I revert changes to retention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applying retention could definitely be improved as you've suggested however retention only gets applied after compaction/downsampling so it shouldn't ever delete data that hasn't been compacted.
All in all, could you please remove the changes to the retention part? I believe this kind of change should be discussed and then done separately. Then we could merge your PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Tests passed too.
Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>
Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>
Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>
Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>
Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>
Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>
b771377
to
dd203bd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it not enough to exclude marked blocks from the planner? 🤔
https://github.com/thanos-io/thanos/blob/main/pkg/compact/compact.go#L983 There is a check before the planning is done. I gues compact group should already contain non overlapping blocks. |
I believe that the correct fix is somewhere in the planner or in the filters? This sounds like a workaround. Groups should just give us the groups, without any filtering. It's the planner's responsibility to do the right thing. Digging a bit into this more and one potential fix is here: #6319 |
I will try your #5603 (comment) instructions here to see what's happening because I believe I'm hitting this too. |
Hello 👋 Looks like there was no activity on this amazing PR for the last 30 days. |
Changes
Verification