Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compact: Fixed issue with no-compact-mark.json being ignored #6229

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

xBazilio
Copy link
Contributor

@xBazilio xBazilio commented Mar 20, 2023

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

Verification

@xBazilio
Copy link
Contributor Author

Possible fix for #5603

@yeya24
Copy link
Contributor

yeya24 commented Mar 20, 2023

Thanks @xBazilio. Can you please elaborate more, how this fixed that issue?

@xBazilio
Copy link
Contributor Author

xBazilio commented Mar 20, 2023

Sometimes blocks are overlapping. The reason doesn't matter now. For compaction not to halt we mark those block with no-compact-mark.json

But this mark is being ignored by compactor and compactor keeps halting.

I described how to reproduce the problem in comment to #5603

There is a method areBlocksOverlapping, which has 2 arguments. It is used later in compact with result of planner.Plan as second argument.

When planner is created it reseived a method to exclude marked blocks.

However there, in the beginning of compact method, areBlocksOverlapping is used without argument, so it would always report about overlapping blocks and halt compaction, regardless of blocks being marked for no compaction.

@xBazilio
Copy link
Contributor Author

I'll look into it more, because it's not very clear what areBlocksOverlapping is actualy doing with its 2 arguments.

@xBazilio
Copy link
Contributor Author

Some comments to what I did

According to #3409, blocks marked with no-compaction-mark.json should not be compacted. But in fact they still meddle in the proccess and even cause compaction to halt.

I have added checks to Groupper and ApplyRetention so that they skip marked blocks.
This allows compactor to continue to work as intended, marked blocks won't be accidentally deleted by ApplyRetention.

I had to add one more test block to compact_e2e_test.go, without it there wasn't enough blocks to compact with excluded marked blocks.

@xBazilio xBazilio force-pushed the fix_ignored_no_compact_mark branch from 70fa331 to 7a91e9a Compare March 27, 2023 07:48
@xBazilio
Copy link
Contributor Author

@yeya24 hello! Would you be so kind as to look into my PR again, please, or tag anyone who can?

I struggled with e2e tests, fixed one that directly releated to compactor, but there are other failing tests.

@xBazilio xBazilio force-pushed the fix_ignored_no_compact_mark branch from bd019a2 to b771377 Compare April 10, 2023 11:44
@xBazilio
Copy link
Contributor Author

All tests passed 🥹

sy.Metas(),
retentionByResolution,
compactMetrics.blocksMarked.WithLabelValues(metadata.DeletionMarkFilename, ""),
noCompactMarkerFilter); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the grouper change but why we should ignore no compact blocks in here? We still want to delete old blocks even if they have that mark, right? Otherwise, the block storage usage will grow inexorably.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is: overlap is somthing out of ordinary. One block can be small enough that it will never be downsampled. Thus, if we delete it with retention policy, we would get a gap in metrics.
I suppose that we won't be marking blocks in huge numbers, so storage usage shouldn't grow uncontrollably.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But no-compact-mark.json can appear even without the user knowing: https://github.com/thanos-io/thanos/blob/main/pkg/compact/planner.go#L282-L287. I have some blocks with no-compact-mark.json too due to this. So, with your change they would never get deleted because they would be filtered out by the retention apply function, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Just today I stumbled on this case.
It's tricky. Maybe there should be a flag to control wether to prevent retention policy from deleting marked blocks.

Maybe retention policy alhorithm should check if there is a block with lower resolution before deleting block. We prefer to store 1h resolutoin forever, while having 0s for 12 days and 5m for 60 days. Also we had compaction halted for several months, before we noticed. If we would allow retention to remove blocks it would just delete all raw data that wasn't compacted due to halting. We solved it for now by turning off retention policy.

What do you think? Can I make a flag, or should I revert changes to retention?

Copy link
Member

@GiedriusS GiedriusS Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applying retention could definitely be improved as you've suggested however retention only gets applied after compaction/downsampling so it shouldn't ever delete data that hasn't been compacted.

All in all, could you please remove the changes to the retention part? I believe this kind of change should be discussed and then done separately. Then we could merge your PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Tests passed too.

Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>
Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>
Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>
Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>
Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>
Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>
@xBazilio xBazilio force-pushed the fix_ignored_no_compact_mark branch from b771377 to dd203bd Compare April 13, 2023 08:42
Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it not enough to exclude marked blocks from the planner? 🤔

@xBazilio
Copy link
Contributor Author

Why is it not enough to exclude marked blocks from the planner? thinking

https://github.com/thanos-io/thanos/blob/main/pkg/compact/compact.go#L983

There is a check before the planning is done. I gues compact group should already contain non overlapping blocks.

@GiedriusS
Copy link
Member

I believe that the correct fix is somewhere in the planner or in the filters? This sounds like a workaround. Groups should just give us the groups, without any filtering. It's the planner's responsibility to do the right thing. Digging a bit into this more and one potential fix is here: #6319

@GiedriusS
Copy link
Member

I will try your #5603 (comment) instructions here to see what's happening because I believe I'm hitting this too.

@stale
Copy link

stale bot commented Jun 18, 2023

Hello 👋 Looks like there was no activity on this amazing PR for the last 30 days.
Do you mind updating us on the status? Is there anything we can help with? If you plan to still work on it, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next week, this issue will be closed (we can always reopen a PR if you get back to this!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Jun 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants