Doc improvements (#909)

* Updated to match querier poll cycle Signed-off-by: Joe Elliott <number101010@gmail.com> * Removed incorrect sentence in runbook Signed-off-by: Joe Elliott <number101010@gmail.com> * Added notes Signed-off-by: Joe Elliott <number101010@gmail.com>
grafana · Aug 24, 2021 · 1ff3a59 · 1ff3a59
1 parent aed734c
commit 1ff3a59
Show file tree

Hide file tree

Showing 4 changed files with 12 additions and 8 deletions.
diff --git a/docs/tempo/website/configuration/polling.md b/docs/tempo/website/configuration/polling.md
@@ -37,11 +37,16 @@ ingester:
 
 The compactor `compacted_block_retention` is used to keep a block in the backend for a given period of time
 after it has been compacted and the data is no longer needed. This allows queriers with a stale blocklist to access
-these blocks successfully until they complete their polling cycles and have up to date blocklists.
+these blocks successfully until they complete their polling cycles and have up to date blocklists. Like the 
+`complete_block_timeout` this should be at a minimum 2x the configurated `blocklist_poll` duration.
 
 ```
 compactor:
   compaction:
     # How long to leave a block in the backend after it has been compacted successfully.  Default is 1h
     [compacted_block_retention: <duration>]
-```
+```
+
+Additionally, it is important that the querier `blocklist_poll` duration is greater than or equal to the compactor 
+`blocklist_poll` duration. Otherwise a querier may not correctly check all assigned blocks and incorrectly return 404. 
+It is recommended to simply set both components to use the same poll duration.
diff --git a/docs/tempo/website/operations/polling.md b/docs/tempo/website/operations/polling.md
@@ -14,7 +14,9 @@ what's called a tenant index. The tenant index is a gzip'ed json file located at
 an entry for every block and compacted block for that tenant. This is done once every `blocklist_poll` duration.
 
 All other compactors and all queriers then rely on downloading this file, unzipping it and using the contained list. 
-Again this is done once every `blocklist_poll` duration.
+Again this is done once every `blocklist_poll` duration. **NOTE** It is important that the querier `blocklist_poll` duration 
+is greater than or equal to the compactor `blocklist_poll` duration. Otherwise a querier may not correctly check
+all assigned blocks and incorrectly return 404.
 
 Due to this behavior a given compactor or querier will often have an out of date blocklist. During normal operation
 it will stale by at most 2x the configured `blocklist_poll`. See [configuration]({{< relref "../configuration/polling" >}})

diff --git a/operations/jsonnet/microservices/configmap.libsonnet b/operations/jsonnet/microservices/configmap.libsonnet
@@ -73,7 +73,7 @@
     },
     storage+: {
       trace+: {
-        blocklist_poll: '10m',
+        blocklist_poll: '5m',
       },
     },
   },

diff --git a/operations/tempo-mixin/runbook.md b/operations/tempo-mixin/runbook.md
@@ -6,10 +6,7 @@ This document should help with remediating operational issues in Tempo.
 ## TempoRequestLatency
 
 Aside from obvious errors in the logs the only real lever you can pull here is scaling.  Use the Reads or Writes dashboard
-to identify the component that is struggling and scale it up.  It should be noted that right now quickly scaling the
-Ingester component can cause 404s on traces until they are flushed to the backend.  For safety you may only want to
-scale one per hour.  However, if Ingesters are falling over, it's better to scale fast, ingest successfully and throw 404s
-on query than to have an unstable ingest path.  Make the call!
+to identify the component that is struggling and scale it up.
 
 The Query path is instrumented with tracing (!) and this can be used to diagnose issues with higher latency. View the logs of
 the Query Frontend, where you can find an info level message for every request. Filter for requests with high latency and view traces.