Skip to content

Commit

Permalink
fix: added delete data flag for fix WAL issues (#1027)
Browse files Browse the repository at this point in the history
Cherry pick (and simplify)
#947 to
resolve common WAL issues.
  • Loading branch information
bwplotka authored Jun 12, 2024
2 parents 2d60e69 + f4b179a commit 2c306f4
Show file tree
Hide file tree
Showing 3 changed files with 18 additions and 3 deletions.
7 changes: 7 additions & 0 deletions charts/operator/templates/collector.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,13 @@ spec:
- --enable-feature=google-kubernetes-secret-provider
- --storage.tsdb.path=/prometheus/data
- --storage.tsdb.no-lockfile
# Special Google flag for force deleting all data on start. We use ephemeral storage in
# this manifest, but there are cases were container restart still reuses, potentially
# bad data (corrupted, with high cardinality causing OOMs or slow startups).
# Force deleting, so container restart is consistent with pod restart.
# NOTE: Data is likely already sent GCM, plus GCM export does not use that
# data on disk (WAL).
- --gmp.storage.delete-data-on-start
# Keep 30 minutes of data. As we are backed by an emptyDir volume, this will count towards
# the containers memory usage. We could lower it further if this becomes problematic, but
# it the window for local data is quite convenient for debugging.
Expand Down
5 changes: 3 additions & 2 deletions charts/operator/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,9 @@ images:
image: gke.gcr.io/prometheus-engine/operator
tag: "v0.9.0-gke.1"
prometheus:
image: gke.gcr.io/prometheus-engine/prometheus
tag: "v2.45.3-gmp.1-gke.0"
# TODO(bwplotka): Change to "v2.45.3-gmp.4-gke.0" once tags are cloned.
image: gke.gcr.io/prometheus-engine/prometheus@sha256
tag: 7473d52f4a3e563e6377f8a6183091f25192b1e0705dd0933903e800bd69b7b2
ruleEvaluator:
image: gke.gcr.io/prometheus-engine/rule-evaluator
tag: v0.9.0-gke.1
Expand Down
9 changes: 8 additions & 1 deletion manifests/operator.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -382,14 +382,21 @@ spec:
- all
privileged: false
- name: prometheus
image: gke.gcr.io/prometheus-engine/prometheus:v2.45.3-gmp.1-gke.0
image: gke.gcr.io/prometheus-engine/prometheus@sha256:7473d52f4a3e563e6377f8a6183091f25192b1e0705dd0933903e800bd69b7b2
args:
- --config.file=/prometheus/config_out/config.yaml
- --enable-feature=exemplar-storage
# Special Google flag for authorization using native Kubernetes secrets.
- --enable-feature=google-kubernetes-secret-provider
- --storage.tsdb.path=/prometheus/data
- --storage.tsdb.no-lockfile
# Special Google flag for force deleting all data on start. We use ephemeral storage in
# this manifest, but there are cases were container restart still reuses, potentially
# bad data (corrupted, with high cardinality causing OOMs or slow startups).
# Force deleting, so container restart is consistent with pod restart.
# NOTE: Data is likely already sent GCM, plus GCM export does not use that
# data on disk (WAL).
- --gmp.storage.delete-data-on-start
# Keep 30 minutes of data. As we are backed by an emptyDir volume, this will count towards
# the containers memory usage. We could lower it further if this becomes problematic, but
# it the window for local data is quite convenient for debugging.
Expand Down

0 comments on commit 2c306f4

Please sign in to comment.