From b75338c19496eaeedf435688306a2793845b308e Mon Sep 17 00:00:00 2001 From: Tyler Reid Date: Mon, 16 Aug 2021 16:29:34 -0500 Subject: [PATCH 1/3] Clarify the gsutil mv command for moving corrupted blocks Signed-off-by: Tyler Reid --- cortex-mixin/docs/playbooks.md | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/cortex-mixin/docs/playbooks.md b/cortex-mixin/docs/playbooks.md index 98e0b94e..085e38dc 100644 --- a/cortex-mixin/docs/playbooks.md +++ b/cortex-mixin/docs/playbooks.md @@ -380,16 +380,14 @@ When this happen you should: 2. Ensure the compactor has recovered 3. Investigate offline the root cause (eg. download the corrupted block and debug it locally) -To rename a block stored on GCS you can use the `gsutil` CLI: - +To rename a block stored on GCS you can use the `gsutil` CLI command: ``` -# Replace the placeholders: -# - BUCKET: bucket name -# - TENANT: tenant ID -# - BLOCK: block ID - gsutil mv gs://BUCKET/TENANT/BLOCK gs://BUCKET/TENANT/corrupted-BLOCK ``` +Where: +- `BUCKET` is the gcs bucket name the compactor is using. The cell's bucket name is specified as the `blocks_storage_bucket_name` in the cell configuration +- `TENANT` is the tenant id reported in the example error message above as `REDACTED-TENANT` +- `BLOCK` is the last part of the file path reported as `REDACTED-BLOCK` in the example error message above ### CortexBucketIndexNotUpdated From be0460522531262fd1503d88f7f600996141e2f8 Mon Sep 17 00:00:00 2001 From: Tyler Reid Date: Mon, 16 Aug 2021 16:37:53 -0500 Subject: [PATCH 2/3] Update the changelog Signed-off-by: Tyler Reid --- CHANGELOG.md | 1 + 1 file changed, 1 insertion(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index a3bd9325..a61ebcf7 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -41,6 +41,7 @@ * "Sharding Initial State Sync" row - information about the initial state sync procedure when sharding is enabled. * "Sharding Runtime State Sync" row - information about various state operations which occur when sharding is enabled (replication, fetch, marge, persist). * [ENHANCEMENT] Added 256MB memory ballast to querier. #369 +* [ENHANCEMENT] Update gsutil command for `not healthy index found` playbook #370 * [BUGFIX] Fixed `CortexIngesterHasNotShippedBlocks` alert false positive in case an ingester instance had ingested samples in the past, then no traffic was received for a long period and then it started receiving samples again. #308 * [BUGFIX] Alertmanager: fixed `--alertmanager.cluster.peers` CLI flag passed to alertmanager when HA is enabled. #329 * [BUGFIX] Fixed `CortexInconsistentRuntimeConfig` metric. #335 From d346fbb74c84f616d065b5575178ec28ae5e9cff Mon Sep 17 00:00:00 2001 From: Tyler Reid Date: Tue, 17 Aug 2021 09:52:07 -0500 Subject: [PATCH 3/3] Modify log message to fit example command Signed-off-by: Tyler Reid --- cortex-mixin/docs/playbooks.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cortex-mixin/docs/playbooks.md b/cortex-mixin/docs/playbooks.md index 085e38dc..5dcf8d52 100644 --- a/cortex-mixin/docs/playbooks.md +++ b/cortex-mixin/docs/playbooks.md @@ -372,7 +372,7 @@ How to **investigate**: The compactor may fail to compact blocks due a corrupted block index found in one of the source blocks: ``` -level=error ts=2020-07-12T17:35:05.516823471Z caller=compactor.go:339 component=compactor msg="failed to compact user blocks" user=REDACTED err="compaction: group 0@6672437747845546250: block with not healthy index found /data/compact/0@6672437747845546250/REDACTED; Compaction level 1; Labels: map[__org_id__:REDACTED]: 1/1183085 series have an average of 1.000 out-of-order chunks: 0.000 of these are exact duplicates (in terms of data and time range)" +level=error ts=2020-07-12T17:35:05.516823471Z caller=compactor.go:339 component=compactor msg="failed to compact user blocks" user=REDACTED-TENANT err="compaction: group 0@6672437747845546250: block with not healthy index found /data/compact/0@6672437747845546250/REDACTED-BLOCK; Compaction level 1; Labels: map[__org_id__:REDACTED]: 1/1183085 series have an average of 1.000 out-of-order chunks: 0.000 of these are exact duplicates (in terms of data and time range)" ``` When this happen you should: