Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make LivenessCheck Timeout Configurable #6227

Merged
merged 1 commit into from
Sep 25, 2024

Conversation

rajagopalanand
Copy link
Contributor

What this PR does:

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@rajagopalanand
Copy link
Contributor Author

rajagopalanand commented Sep 23, 2024

Screenshot 2024-09-22 at 8 32 01 PM

histogram_quantile(0.99, sum(rate(cortex_ruler_client_request_duration_seconds_bucket{operation="/ruler.Ruler/LivenessCheck"}[2m])) by (le,instance))

Depending on the environment, the LivenessCheck could take longer than the original 100 ms like the graph above shows. I want to set the timeout to 1s because it is not configurable

@rajagopalanand rajagopalanand marked this pull request as ready for review September 23, 2024 01:37
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make it configurable then?
There might be cases where 1s timeout couldn't work

@rajagopalanand
Copy link
Contributor Author

Let's make it configurable then? There might be cases where 1s timeout couldn't work

Wondering what would be the best way to do this. There is already a Remote timeout (remote_timeout) under ruler client which is set to 2 minutes. If I introduce a config specifically for LivenessCheck, I was thinking it should also go under ruler client but then it might be confusing to have a generic deadline (remote_timeout) and a specific one for LivenessCheck

@rapphil
Copy link
Contributor

rapphil commented Sep 23, 2024

I think the liveness check should not be part of the client, because it is part of the business logic of the ruler, and has nothing to do with the connection which is the transport mechanism.

@rajagopalanand rajagopalanand force-pushed the update-liveness-timeout branch 2 times, most recently from f889e53 to f5a7201 Compare September 24, 2024 02:02
@rajagopalanand rajagopalanand changed the title Increase LivenessCheck timeout Make LivenessCheck Timeout Configurable Sep 24, 2024
@rajagopalanand rajagopalanand force-pushed the update-liveness-timeout branch 2 times, most recently from 3a47852 to 5605632 Compare September 24, 2024 02:30
docs/guides/ruler-high-availability.md Outdated Show resolved Hide resolved
pkg/ruler/ruler.go Outdated Show resolved Hide resolved
@@ -238,6 +237,7 @@ func (cfg *Config) RegisterFlags(f *flag.FlagSet) {
f.BoolVar(&cfg.DisableRuleGroupLabel, "ruler.disable-rule-group-label", false, "Disable the rule_group label on exported metrics")

f.BoolVar(&cfg.EnableHAEvaluation, "ruler.enable-ha-evaluation", false, "Enable high availability")
f.DurationVar(&cfg.LivenessCheckTimeout, "ruler.liveness-check-timeout", 1*time.Second, "Liveness check timeout")
Copy link
Contributor

@yeya24 yeya24 Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we document the timeout itself. Like what is it for? So that user get some idea about what's the purpose of the flag

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please take look? I made an update

@rajagopalanand rajagopalanand force-pushed the update-liveness-timeout branch 2 times, most recently from 625d040 to a24ca1a Compare September 25, 2024 19:49
Signed-off-by: Anand Rajagopal <anrajag@amazon.com>
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@yeya24 yeya24 merged commit cccfd73 into cortexproject:master Sep 25, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants