Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BlockQueryWaitTime Nomad Configuration Option #755

Merged
merged 4 commits into from
Dec 20, 2023

Conversation

jeffwecan
Copy link
Contributor

@jeffwecan jeffwecan commented Nov 3, 2023

This PR adds a BlockQueryWaitTime config file option / -nomad-block-query-wait-time CLI flag to be used when folks want to either:

  1. have the autoscaler's Nomad API client wait longer than five minutes for blocking query requests to return a response
  2. or have the client wait a shorter duration for such responses (for instance, if there is a LB idle timeout < 5 minutes in between the autoscaler and associated Nomad API)

I.e., Resolves #754


Aside from the test case updates included as part of this PR's changes, a bit of practical testing was performed as well. Specifically with a shorter-than-5m-default wait time, but still longer than our 60 LB timeout, configured for the autoscaler:

nomad {
 namespace = "<our_namespace>"
 block_query_wait_time = "2m"
}

Combined with our dev deployment's 60 second idle timeout Nomad LB, we hit the precipitating error but now with our non-default 2m / 120000ms wait time reflected (yay):

failed to call the Nomad list policies API: Get "https://<our_nomad>:4646/v1/scaling/policies?index=47390046&namespace=<our_namespace>&wait=120000ms": EOF

Subsequently, we set block_query_wait_time = "56s" which was seen to prevent such errors with our particular Nomad API connectivity arrangements.


Also going to link the initial PR adding blocking queries here (for general context + so there is a breadcrumb to this PR in that preceding one): #38

@jeffwecan jeffwecan force-pushed the jeffwecan/issue-754_add_wait_time_cfg_option branch 3 times, most recently from de94069 to 0ebd040 Compare November 3, 2023 17:35

// BlockQueryWaitTime controls how long Nomad API requests supporting blocking queries
// are held open. Defaults to 5m.
BlockQueryWaitTime time.Duration
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general note regarding the option name choice: I waffled between a few options before settling on this. Primarily motivated by this bit of similar config in the Nomad codebase (though related to a Consul API client setting in that case 🤷🏻): https://github.com/hashicorp/nomad/blob/v1.7.0-beta.1/client/config/config.go#L380-L387

(Happy to call it whatever though 😄 )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the name looks good to me 👍

@jeffwecan jeffwecan marked this pull request as ready for review November 3, 2023 18:10
@jeffwecan jeffwecan changed the title [DRAFT] Add WaitTime General Configuration Option Add BlockQueryWaitTime Nomad Configuration Option Nov 3, 2023
@jeffwecan jeffwecan changed the title Add BlockQueryWaitTime Nomad Configuration Option Add BlockQueryWaitTime Nomad Configuration Option Nov 3, 2023
@jeffwecan
Copy link
Contributor Author

@jrasell: Would it be possible to get this PR reviewed ahead of the next autoscaler release? 🙏🏻

@lgfa29 lgfa29 force-pushed the jeffwecan/issue-754_add_wait_time_cfg_option branch from 0ebd040 to 65e0f45 Compare December 20, 2023 22:21
Copy link
Contributor

@lgfa29 lgfa29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @jeffwecan!

And apologies for the delay in getting this merged. Nomad 1.7 and Autoscaler HA kept us busy for a while. I'm going over the opened PRs and issues to prepare a new release.


// BlockQueryWaitTime controls how long Nomad API requests supporting blocking queries
// are held open. Defaults to 5m.
BlockQueryWaitTime time.Duration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the name looks good to me 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Static WaitTime Query Values + Server-side Timeouts Lead to Spurious "EOF" Errors
2 participants