Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compact: Apply retention before compaction #5766

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re
- [#5641](https://github.com/thanos-io/thanos/pull/5641) Query: Inject unshardable le label in query analyzer.
- [#5685](https://github.com/thanos-io/thanos/pull/5685) Receive: Make active/head series limiting configuration per tenant by adding it to new limiting config.
- [#5411](https://github.com/thanos-io/thanos/pull/5411) Tracing: Change Jaeger exporter from OpenTracing to OpenTelemetry. *Options `RPC Metrics`, `Gen128Bit` and `Disabled` are now deprecated and won't have any effect when set :warning:.*
- [#5766](https://github.com/thanos-io/thanos/pull/5766) Compact: Apply retention before compaction.

### Removed

Expand Down
18 changes: 9 additions & 9 deletions cmd/thanos/compact.go
Original file line number Diff line number Diff line change
Expand Up @@ -420,6 +420,15 @@ func runCompact(
}

compactMainFn := func() error {
// TODO(bwplotka): Find a way to avoid syncing if no op was done.
if err := sy.SyncMetas(ctx); err != nil {
return errors.Wrap(err, "sync before retention")
}

if err := compact.ApplyRetentionPolicyByResolution(ctx, logger, bkt, sy.Metas(), retentionByResolution, compactMetrics.blocksMarked.WithLabelValues(metadata.DeletionMarkFilename, "")); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern I have is that some raw blocks might beyond the retention. But they are able to be downsampled to 5m blocks that still within the 5m block retention period.
Changing the order makes users configure a longer retention for 5m blocks to get those downsampled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yeya24 Yeah, this makes sense. So basically for this to be safe, ApplyRetentionPolicyByResolution would need to also check if we have downsampled a block already. And only apply retention if it's been downsampled.

However, this defeats the point (with the current implementation), as downsampling only occurs after compaction. So we are where we started again, unless we also perform downsampling before compaction, and have a check in the downsampler that ensures we are at the right compaction level for downsampling.

Copy link
Contributor

@yeya24 yeya24 Oct 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving retention to the beginning could also mistakenly deleted blocks that are expected to be compacted. Since currently retention calculation uses the block's max time, so after a compaction we may keep the block instead of deleting the small blocks.
Corretness wise I don't think this is safe to do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could handle this case by simulating all compaction + downsampling operations, and if the resulting block would be outside of retention, mark it for deletion immediately.

The specific case that this would help with: Lowering retention to purge the backlog. It would come at the cost of additional complexity though, so I'm not sure it's worth it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is for purging the backlog we have the bucket tool retention already. Does that satisfy your usecase?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, TIL about bucket tool retention. Though I suspect it suffers from the same issue we discussed here: it does not take into account downsampling, so we run the risk of purging blocks that have not yet been downsampled.

In other words, using that tool is unsafe for the same reason that this proposed patch is unsafe.

Copy link
Contributor

@yeya24 yeya24 Oct 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it is the same as this patch so you use it at your own risk.
If you agree it is the same then do we still need this patch? If you have other things to improve we can discuss on the issue #1708

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think we can close this. 👍

return errors.Wrap(err, "retention failed")
}

if err := compactor.Compact(ctx); err != nil {
return errors.Wrap(err, "compaction")
}
Expand Down Expand Up @@ -454,15 +463,6 @@ func runCompact(
level.Info(logger).Log("msg", "downsampling was explicitly disabled")
}

// TODO(bwplotka): Find a way to avoid syncing if no op was done.
if err := sy.SyncMetas(ctx); err != nil {
return errors.Wrap(err, "sync before retention")
}

if err := compact.ApplyRetentionPolicyByResolution(ctx, logger, bkt, sy.Metas(), retentionByResolution, compactMetrics.blocksMarked.WithLabelValues(metadata.DeletionMarkFilename, "")); err != nil {
return errors.Wrap(err, "retention failed")
}

return cleanPartialMarked()
}

Expand Down