Merge pull request #370 from grafana/treid/not-found-index-details

Clarify the gsutil mv command for moving corrupted blocks
grafana · Aug 17, 2021 · 2250a8f · 2250a8f
2 parents f0ed263 + d346fbb
commit 2250a8f
Show file tree

Hide file tree

Showing 2 changed files with 7 additions and 8 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -41,6 +41,7 @@
   * "Sharding Initial State Sync" row - information about the initial state sync procedure when sharding is enabled.
   * "Sharding Runtime State Sync" row - information about various state operations which occur when sharding is enabled (replication, fetch, marge, persist).
 * [ENHANCEMENT] Added 256MB memory ballast to querier. #369
+* [ENHANCEMENT] Update gsutil command for `not healthy index found` playbook #370
 * [BUGFIX] Fixed `CortexIngesterHasNotShippedBlocks` alert false positive in case an ingester instance had ingested samples in the past, then no traffic was received for a long period and then it started receiving samples again. #308
 * [BUGFIX] Alertmanager: fixed `--alertmanager.cluster.peers` CLI flag passed to alertmanager when HA is enabled. #329
 * [BUGFIX] Fixed `CortexInconsistentRuntimeConfig` metric. #335

diff --git a/cortex-mixin/docs/playbooks.md b/cortex-mixin/docs/playbooks.md
@@ -372,24 +372,22 @@ How to **investigate**:
 The compactor may fail to compact blocks due a corrupted block index found in one of the source blocks:
 
 ```
-level=error ts=2020-07-12T17:35:05.516823471Z caller=compactor.go:339 component=compactor msg="failed to compact user blocks" user=REDACTED err="compaction: group 0@6672437747845546250: block with not healthy index found /data/compact/0@6672437747845546250/REDACTED; Compaction level 1; Labels: map[__org_id__:REDACTED]: 1/1183085 series have an average of 1.000 out-of-order chunks: 0.000 of these are exact duplicates (in terms of data and time range)"
+level=error ts=2020-07-12T17:35:05.516823471Z caller=compactor.go:339 component=compactor msg="failed to compact user blocks" user=REDACTED-TENANT err="compaction: group 0@6672437747845546250: block with not healthy index found /data/compact/0@6672437747845546250/REDACTED-BLOCK; Compaction level 1; Labels: map[__org_id__:REDACTED]: 1/1183085 series have an average of 1.000 out-of-order chunks: 0.000 of these are exact duplicates (in terms of data and time range)"
 ```
 
 When this happen you should:
 1. Rename the block prefixing it with `corrupted-` so that it will be skipped by the compactor and queriers. Keep in mind that doing so the block will become invisible to the queriers too, so its series/samples will not be queried. If the corruption affects only 1 block whose compaction `level` is 1 (the information is stored inside its `meta.json`) then Cortex guarantees no data loss because all the data is replicated across other blocks. In all other cases, there may be some data loss once you rename the block and stop querying it.
 2. Ensure the compactor has recovered
 3. Investigate offline the root cause (eg. download the corrupted block and debug it locally)
 
-To rename a block stored on GCS you can use the `gsutil` CLI:
-
+To rename a block stored on GCS you can use the `gsutil` CLI command:
 ```
-# Replace the placeholders:
-# - BUCKET: bucket name
-# - TENANT: tenant ID
-# - BLOCK:  block ID
-
 gsutil mv gs://BUCKET/TENANT/BLOCK gs://BUCKET/TENANT/corrupted-BLOCK
 ```
+Where:
+- `BUCKET` is the gcs bucket name the compactor is using. The cell's bucket name is specified as the `blocks_storage_bucket_name` in the cell configuration
+- `TENANT` is the tenant id reported in the example error message above as `REDACTED-TENANT`
+- `BLOCK` is the last part of the file path reported as `REDACTED-BLOCK` in the example error message above
 
 ### CortexBucketIndexNotUpdated