Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

chunk-max-stale / metrics-max-stale per retention #614

Closed
Dieterbe opened this issue Apr 24, 2017 · 7 comments
Closed

chunk-max-stale / metrics-max-stale per retention #614

Dieterbe opened this issue Apr 24, 2017 · 7 comments
Labels

Comments

@Dieterbe
Copy link
Contributor

Dieterbe commented Apr 24, 2017

(originally suggested here #557 (comment))

currently, for every new deployment, we have to figure out what is the largest interval at which a customer will send data, and tweak max-stale settings accordingly. otherwise we run the risk of closing chunks prematurely and dropping valid points.at the same time low resolutions benefit from lower max-stale settings.

We could make our life easier and automate this step of the provisioning and just tie these settings to the retention policy.
one approach could be making retention policies like :

series-interval:retention[:chunkspan:numchunks:ready:chunk-max-stale:metric-max-stale]

OTOH this will make our schemas definitions even more noisy. we could also just introduce extra attributes, in addition to the pattern and retentions fields, describing the value as a number of chunkspans.
e.g.

[apache_busyWorkers]
pattern = ^servers\.www.*\.workers\.busyWorkers$
retentions = 1s:1d:10min:1,1m:21d,15m:5y:2h:1:false
chunk-max-stale = 5 # persist chunk after 5x the chunkspan has passed
metric-max-stale = 6 # purge from memory after 6x the chunkspan has passed

but I don't think large and small chunkspans should be tied to the same factor for two reasons:

  1. if sender experiences an interruption in metric sending, after which it'll process the backlog, the time in which humans can restore such interruption is usually independent of interval or chunkspan of the data. and GC'd chunks also cause metricpersist messages, meaning incomplete chunks won't be overwritten if the full data comes in later. so using the factor approach is a disadvantage to high-res, small-chunkspan data. such data probably deserves proportionally higher max-stale settings.
  2. it also depends on kafka retention. if kafka retention is 12h and chunkspan is 1h, then we should be able to wait +- 11h before sealing a chunk and saving it, to allow as much time as possible to complete a chunk, while still not risking waiting too long (and making data unrecoverable in case of a primary crash)
    and while the max-stale here would correspond to the chunkspan, it's not as a factor.
@Dieterbe
Copy link
Contributor Author

@woodsaj said we can simply base it off the chunkspan, but I don't think that'll work. e.g. if chunkspan is 1h but we retain data in kafka for 12h, then there's no need to set max-stale to ~1h, we could wait many hours for new data to come in and complete the chunk; which is still safe since if the primary crashes it'll be able to recreate the chunk. a chunk-max-stale value of something like kafka retention minus chunkspan seems to make more sense to me.

@woodsaj
Copy link
Member

woodsaj commented Apr 25, 2017

we dont need per retention settings.

metrics-max-stale is only needed if users have a dynamic workload (most of our users are in this category.)
The index pruning is necessary to remove series that are no longer being sent so they dont show up in templateVars or grafana query editor. However, once grafana/grafana#8055 is implemented, we know longer need to prune from the index.

For chunk-max-stale we are trying to protect against data loss when MT restarts. For there to be no data loss, any chunk that is un-saved must exist in Kafka. So really, we need a max-chunk-age setting, as we need to consider the time the first point was seen, and not the time of the last point. If the chunk creation time is < (now - kafkaRetention) then we must save the chunk or data will be lost if MT restarts.

@Dieterbe
Copy link
Contributor Author

too bad sarama/kafka doesn't have a way to query the current retention settings for a topic at runtime.
so it's on us to make sure the MT setting corresponds to the kafka setting in use. not a big deal though since we stick to the same settings consistently with rare exceptions.

note that we'll have to replace the current per-aggmetric lastWrite with a per-chunk firstWrite.
but it'll make it more safe. currently old chunks can get very stale as long as the latest one is being written to.

@Dieterbe
Copy link
Contributor Author

Dieterbe commented Jan 26, 2018

thinking a bit more about this. I think there's 3 additional important time windows:

  1. it's possible that a chunk becomes not-completely-kafka-backed in between 2 GC runs.
    so at any GC run, we should check whether a chunk will be safe if we keep it until the next GC run.

  2. we have to take into account how much time a GC run will need between start and actually saving the chunks to the store (could potentially be automatically measured by MT or just configured explicitly)

  3. the amount of time we would need to restart MT so that it can safely start replaying data again.

so the formula becomes: chunk creation time is < (now - kafkaRetention +GC interval + safety-window) where safety-window addresses point 2 and 3. some write queues take more than 1h to drain, so a safe default would be something like 2h I think.

we should also assure that normally, when we close & save a chunk, no new data comes in for that chunk, which could happen if the first data came in real-time, but data towards the end is being delayed.
so, this means we can get a formula of what kafkaRetention should be.

max-age: kafkaRetention - GC-interval - safety-window # to assure no data loss in case of MT restart
chunkspan+max-delay < max-age # to get all the data into the chunk before persisting
=>
chunkspan + max-delay < kafkaRetention - GC-interval - safety-window
kafkaRetention > chunkspan + max-delay + GC-interval + safety-window 

so for example if coarsest rollups have 6h spans, hourly GC runs with a safety-window of 2h and a max permissible data lag of 3h. our minimum kafka retention becomes 6h + 3h + 1h + 2h = 12h

much more than what we currently use, but also much safer I think.
this way we can define our max tolerances explicitly, and tolerate each of them to be at their worst simultaneously.

i have to think a bit more about what the implications would be if an issue starts (e.g. unable to save to cassandra, trouble consuming from kafka, etc). I'ld like to come up with a formula to determine how quickly we must increase kafka retention, and by how much. but I wanted to post this so we can start discussing. Also the ROB will change the picture a bit, probably need to put in an extra term for that.

another interesting thought: if you have a retention of say 10h, but chunks of 30min, and the data stops, then the last chunk will be saved pretty late (e.g after 6h or so) so you'd have to wait a bit long before that data shows up

@Dieterbe
Copy link
Contributor Author

@woodsaj does the above make sense to you?

@woodsaj
Copy link
Member

woodsaj commented Feb 19, 2018

yes, but we should probably also add the segmentSize to the retention. So that way when #850 is merged we can safely start consuming from "retention - segmentSize"

@stale
Copy link

stale bot commented Apr 4, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 4, 2020
@stale stale bot closed this as completed Apr 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants