-
Notifications
You must be signed in to change notification settings - Fork 810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rulers: Add support to persist tokens in rulers #5987
rulers: Add support to persist tokens in rulers #5987
Conversation
Signed-off-by: Raphael Silva <rapphil@gmail.com>
0329b52
to
c8fd360
Compare
Signed-off-by: Raphael Silva <rapphil@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one minor nit
@@ -75,6 +76,7 @@ func (cfg *RingConfig) RegisterFlags(f *flag.FlagSet) { | |||
f.DurationVar(&cfg.FinalSleep, "ruler.ring.final-sleep", 0*time.Second, "The sleep seconds when ruler is shutting down. Need to be close to or larger than KV Store information propagation delay") | |||
f.IntVar(&cfg.ReplicationFactor, "ruler.ring.replication-factor", 1, "EXPERIMENTAL: The replication factor to use when loading rule groups for API HA.") | |||
f.BoolVar(&cfg.ZoneAwarenessEnabled, "ruler.ring.zone-awareness-enabled", false, "EXPERIMENTAL: True to enable zone-awareness and load rule groups across different availability zones for API HA.") | |||
f.StringVar(&cfg.TokensFilePath, "ruler.ring.tokens-file-path", "", "EXPERIMENTAL: File path where tokens are stored. If empty, tokens are not stored at shutdown and restored at startup.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with this. Please include it in https://github.com/cortexproject/cortex/blob/e6a4e495ea2430eba8db8b7291b7944f03ab6471/docs/configuration/v1-guarantees.md too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why we need to mark this feature as experimental anymore. Thought the initial feature was added long time ago.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yeya24 are you referring to this new flag for rulers or something else? if the former, this feature does not exist in rulers and therefore should be experimental before stabilization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For other components. I think it might be the same for Ruler.
But I am okay to mark this as experimental for now. We can clean it up later in #5922
Signed-off-by: Raphael Silva <rapphil@gmail.com>
Signed-off-by: Raphael Silva <rapphil@gmail.com>
@friedrichg done! thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just a nit about the changelog.
Signed-off-by: Raphael Silva <rapphil@gmail.com>
Signed-off-by: Raphael Silva <rapphil@gmail.com>
LGTM as well. |
What this PR does:
Allow rulers to reuse tokens across restarts. This is important for us because when you are running a fleet with a large numbers of rulers that are updated one at a time, you get a lot of churn if the tokens are not persistent across restarts: rule groups that were owned by a ruler might change ownership several times because every time they find a new owner, this new owner is also restarted. By preserving the tokens the churn is reduced because after the restart the rules will return to the original ruler that they belong to.
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]