-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow setting TSDB block duration for receive service #1496
Conversation
Signed-off-by: Fawad Halim <fawad@fawad.net>
e3faabe
to
ede0620
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious what's the use case?
This is always tricky as it can have negative effects e.g with upload, query and compact characteristics. Plus usually there is no need to change this. Any particular reason?
If there is use case I am fine with adding this but as Hidden
flag maybe (:
PTAL @squat @metalmatze
cmd/thanos/receive.go
Outdated
@@ -60,6 +60,8 @@ func registerReceive(m map[string]setupFunc, app *kingpin.Application, name stri | |||
|
|||
replicationFactor := cmd.Flag("receive.replication-factor", "How many times to replicate incoming write requests.").Default("1").Uint64() | |||
|
|||
tsdbBlockDuration := modelDuration(cmd.Flag("tsdb.blockduration", "Duration for local TSDB blocks").Default("2h")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we maybe make this hidden as it's not recommended to tweak this? Plus we keep more verbose naming (-
between words)
tsdbBlockDuration := modelDuration(cmd.Flag("tsdb.blockduration", "Duration for local TSDB blocks").Default("2h")) | |
tsdbBlockDuration := modelDuration(cmd.Flag("tsdb.block-duration", "Duration for local TSDB blocks").Hidden().Default("2h")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use case is that we generate an really high volume of metrics during a business day, so 2 hours of WAL add up to a pretty huge amount of disk space. That translates to expensive persistant volumes, backups etc. For the receive service, I'm planning to use it as a gateway to the object store, so it's desirable to reduce the duration enough that the average WAL backlog is small enough that we can restore it in a reasonable amount of time on failure.
I've marked the flag as hidden, and corrected the naming. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tsdb supports wal compression so maybe having it enabled would solve your use case without changing the block sizes?
In my tests this halves the size of the wal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Thanos receive should always unconditionally have compression enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WAL compression will definitely help reduce the disk size, reducing the amount of time restores will take. Updating the However, I'd still like to be able to control the maximum amount of time I depend on the block storage device for.
It seems that in Prometheus the max block duration defaults to 10% of the retention period which does not have a lower bound to it (https://github.com/prometheus/prometheus/blob/master/cmd/prometheus/main.go#L317). Is there a reason why that should not be allowed for the receive component?
Signed-off-by: Fawad Halim <fawad@fawad.net>
e215f07
to
955a9a3
Compare
Signed-off-by: Fawad Halim <fawad@fawad.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm ok with this as a hidden flag.
* Allow setting TSDB block duration Signed-off-by: Fawad Halim <fawad@fawad.net> * PR feedback: corrected flag naming, made the flag hidden by default Signed-off-by: Fawad Halim <fawad@fawad.net> * Enable WALCompression on Receive service Signed-off-by: Fawad Halim <fawad@fawad.net>
Allow setting frequency at which TSDB blocks are created for the receive service.
Changes
Extracted tsdb minBlockDuration and maxBlockDuration used in receive service into flag that defaults to the current value of 2h.
Verification
Built and ran service locally to ensure that the flag takes effect as intended.