Skip to content

Commit

Permalink
Merge pull request #370 from grafana/treid/not-found-index-details
Browse files Browse the repository at this point in the history
Clarify the gsutil mv command for moving corrupted blocks
  • Loading branch information
Tyler Reid authored Aug 17, 2021
2 parents f0ed263 + d346fbb commit 2250a8f
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 8 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@
* "Sharding Initial State Sync" row - information about the initial state sync procedure when sharding is enabled.
* "Sharding Runtime State Sync" row - information about various state operations which occur when sharding is enabled (replication, fetch, marge, persist).
* [ENHANCEMENT] Added 256MB memory ballast to querier. #369
* [ENHANCEMENT] Update gsutil command for `not healthy index found` playbook #370
* [BUGFIX] Fixed `CortexIngesterHasNotShippedBlocks` alert false positive in case an ingester instance had ingested samples in the past, then no traffic was received for a long period and then it started receiving samples again. #308
* [BUGFIX] Alertmanager: fixed `--alertmanager.cluster.peers` CLI flag passed to alertmanager when HA is enabled. #329
* [BUGFIX] Fixed `CortexInconsistentRuntimeConfig` metric. #335
Expand Down
14 changes: 6 additions & 8 deletions cortex-mixin/docs/playbooks.md
Original file line number Diff line number Diff line change
Expand Up @@ -372,24 +372,22 @@ How to **investigate**:
The compactor may fail to compact blocks due a corrupted block index found in one of the source blocks:
```
level=error ts=2020-07-12T17:35:05.516823471Z caller=compactor.go:339 component=compactor msg="failed to compact user blocks" user=REDACTED err="compaction: group 0@6672437747845546250: block with not healthy index found /data/compact/0@6672437747845546250/REDACTED; Compaction level 1; Labels: map[__org_id__:REDACTED]: 1/1183085 series have an average of 1.000 out-of-order chunks: 0.000 of these are exact duplicates (in terms of data and time range)"
level=error ts=2020-07-12T17:35:05.516823471Z caller=compactor.go:339 component=compactor msg="failed to compact user blocks" user=REDACTED-TENANT err="compaction: group 0@6672437747845546250: block with not healthy index found /data/compact/0@6672437747845546250/REDACTED-BLOCK; Compaction level 1; Labels: map[__org_id__:REDACTED]: 1/1183085 series have an average of 1.000 out-of-order chunks: 0.000 of these are exact duplicates (in terms of data and time range)"
```
When this happen you should:
1. Rename the block prefixing it with `corrupted-` so that it will be skipped by the compactor and queriers. Keep in mind that doing so the block will become invisible to the queriers too, so its series/samples will not be queried. If the corruption affects only 1 block whose compaction `level` is 1 (the information is stored inside its `meta.json`) then Cortex guarantees no data loss because all the data is replicated across other blocks. In all other cases, there may be some data loss once you rename the block and stop querying it.
2. Ensure the compactor has recovered
3. Investigate offline the root cause (eg. download the corrupted block and debug it locally)
To rename a block stored on GCS you can use the `gsutil` CLI:
To rename a block stored on GCS you can use the `gsutil` CLI command:
```
# Replace the placeholders:
# - BUCKET: bucket name
# - TENANT: tenant ID
# - BLOCK: block ID

gsutil mv gs://BUCKET/TENANT/BLOCK gs://BUCKET/TENANT/corrupted-BLOCK
```
Where:
- `BUCKET` is the gcs bucket name the compactor is using. The cell's bucket name is specified as the `blocks_storage_bucket_name` in the cell configuration
- `TENANT` is the tenant id reported in the example error message above as `REDACTED-TENANT`
- `BLOCK` is the last part of the file path reported as `REDACTED-BLOCK` in the example error message above
### CortexBucketIndexNotUpdated
Expand Down

0 comments on commit 2250a8f

Please sign in to comment.