-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Update retention and concurrency for Thanos #461
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The is some disagreement between the PR description and the commit message (and do we want to include any of those doc links in the commit message?). There is also a typo in the commit message ("information").
5324b7a
to
8934026
Compare
@larsks, thanks for the feedback. I was too quick and dirty on this, and it was too clear in my head, not so much for the uninvolved reader, though :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
This PR addresses the retention rate issues as discussed in nerc-project/operations#618 (comment) (having more than 30d raw etc.). The changes include updating the retention and concurrency settings for the Thanos Compactor to enhance observability and metrics performance. We will stay with the defaults where possible, adding remarks with the defaults to better understand the next changes or possible errors. Changes to focus on the needs for class, cost, and invoice analysis, as for future predictions: - Updated `retentionResolutionRaw` from 30d to 90d (quarterly high details for deep analysis, especially GPUs) - Updated `retentionResolution5m` from 90d to 360d (for cost, usage, and invoices; 15 minutes could be enough, but is not a default option) - Set `retentionResolution1h` to 0d (retain forever, following the default and recommendation) - Added `blockDuration`, `cleanupInterval`, `deleteDelay`, `retentionInLocal`, `consistencyDelay`, `compactConcurrency`, and `downsampleConcurrency` settings: even if staying in the default, making the options visible in case of possible future changes) These changes aim to optimize data retention & resolution for needed use cases and ensure better performance. References: 1. [Thanos Compact Component](https://thanos.io/tip/components/compact.md/) 2. [Recommendations for Running Thanos and Prometheus](https://zapier.com/blog/five-recommendations-when-running-thanos-and-prometheus/) 3. [Red Hat Advanced Cluster Management Observability](https://docs.redhat.com/en/documentation/red_hat_advanced_cluster_management_for_kubernetes/2.9/html/observability/customizing-observability#adding-advanced-config:~:text=is%20not%20displayed.-,4.3.%C2%A0Adding%20advanced%20configuration%20for%20retention,-Add%20the%20advanced) Signed-off-by: /Thor(sten)?/ Schwesig <89909507+schwesig@users.noreply.github.com>
a494d61
to
02436f5
Compare
I rebased this on |
This PR addresses the retention rate issues as discussed in nerc-project/operations#618 (comment) (having more than 30d raw etc.).
The changes include updating the retention and concurrency settings for the Thanos Compactor to enhance observability and metrics performance.
We will stay with the defaults where possible, adding remarks with the defaults to better understand the next changes or possible errors.
Changes to focus on the needs for class, cost, and invoice analysis, as for future predictions:
retentionResolutionRaw
from 30d to 90d (quarterly high details for deep analysis, especially GPUs)retentionResolution5m
from 90d to 360d (for cost, usage, and invoices; 15 minutes could be enough, but is not a default option)retentionResolution1h
to 0d (retain forever, following the default and recommendation)blockDuration
,cleanupInterval
,deleteDelay
,retentionInLocal
,consistencyDelay
,compactConcurrency
, anddownsampleConcurrency
settings: even if staying in the default, making the options visible in case of possible future changes)These changes aim to optimize data retention & resolution for needed use cases and ensure better performance.
References: