Skip to content

Commit

Permalink
Doc improvements (#909)
Browse files Browse the repository at this point in the history
* Updated to match querier poll cycle

Signed-off-by: Joe Elliott <number101010@gmail.com>

* Removed incorrect sentence in runbook

Signed-off-by: Joe Elliott <number101010@gmail.com>

* Added notes

Signed-off-by: Joe Elliott <number101010@gmail.com>
  • Loading branch information
joe-elliott committed Aug 24, 2021
1 parent aed734c commit 1ff3a59
Show file tree
Hide file tree
Showing 4 changed files with 12 additions and 8 deletions.
9 changes: 7 additions & 2 deletions docs/tempo/website/configuration/polling.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,11 +37,16 @@ ingester:

The compactor `compacted_block_retention` is used to keep a block in the backend for a given period of time
after it has been compacted and the data is no longer needed. This allows queriers with a stale blocklist to access
these blocks successfully until they complete their polling cycles and have up to date blocklists.
these blocks successfully until they complete their polling cycles and have up to date blocklists. Like the
`complete_block_timeout` this should be at a minimum 2x the configurated `blocklist_poll` duration.

```
compactor:
compaction:
# How long to leave a block in the backend after it has been compacted successfully. Default is 1h
[compacted_block_retention: <duration>]
```
```

Additionally, it is important that the querier `blocklist_poll` duration is greater than or equal to the compactor
`blocklist_poll` duration. Otherwise a querier may not correctly check all assigned blocks and incorrectly return 404.
It is recommended to simply set both components to use the same poll duration.
4 changes: 3 additions & 1 deletion docs/tempo/website/operations/polling.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,9 @@ what's called a tenant index. The tenant index is a gzip'ed json file located at
an entry for every block and compacted block for that tenant. This is done once every `blocklist_poll` duration.

All other compactors and all queriers then rely on downloading this file, unzipping it and using the contained list.
Again this is done once every `blocklist_poll` duration.
Again this is done once every `blocklist_poll` duration. **NOTE** It is important that the querier `blocklist_poll` duration
is greater than or equal to the compactor `blocklist_poll` duration. Otherwise a querier may not correctly check
all assigned blocks and incorrectly return 404.

Due to this behavior a given compactor or querier will often have an out of date blocklist. During normal operation
it will stale by at most 2x the configured `blocklist_poll`. See [configuration]({{< relref "../configuration/polling" >}})
Expand Down
2 changes: 1 addition & 1 deletion operations/jsonnet/microservices/configmap.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@
},
storage+: {
trace+: {
blocklist_poll: '10m',
blocklist_poll: '5m',
},
},
},
Expand Down
5 changes: 1 addition & 4 deletions operations/tempo-mixin/runbook.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,7 @@ This document should help with remediating operational issues in Tempo.
## TempoRequestLatency

Aside from obvious errors in the logs the only real lever you can pull here is scaling. Use the Reads or Writes dashboard
to identify the component that is struggling and scale it up. It should be noted that right now quickly scaling the
Ingester component can cause 404s on traces until they are flushed to the backend. For safety you may only want to
scale one per hour. However, if Ingesters are falling over, it's better to scale fast, ingest successfully and throw 404s
on query than to have an unstable ingest path. Make the call!
to identify the component that is struggling and scale it up.

The Query path is instrumented with tracing (!) and this can be used to diagnose issues with higher latency. View the logs of
the Query Frontend, where you can find an info level message for every request. Filter for requests with high latency and view traces.
Expand Down

0 comments on commit 1ff3a59

Please sign in to comment.