Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport of Return 429 response on HTTP max connection limit into release/1.3.x #14241

Conversation

hc-github-team-nomad-core
Copy link
Contributor

Backport

This PR is auto-generated from #13621 to be assessed for backporting due to the inclusion of the label backport/1.3.x.

The below text is copied from the body of the original PR.


The limits.http_max_conns_per_client configuration option sets a limit on the max number of concurrent TCP connections per client IP address:

- `http_max_conns_per_client` `(int: 100)` - Configures a limit of how many
concurrent TCP connections a single client IP address is allowed to open to
the agent's HTTP server. This affects the HTTP servers in both client and
server agents. Default value is `100`. `0` disables HTTP connection limits.

The current implementation behavior silently closes the TCP connection, resulting in opaque 'unexpected EOF' or 'connection reset' errors which many users fail to correctly diagnose, and is likely the root cause of various reported Nomad issues.

Instead of silently closing the connection, this PR returns a 429 Too Many Requests HTTP response with a helpful error message to aid debugging when the connection limit is unintentionally reached. This should help diagnose/fix some instances of #12273 and #8718, where the fix is simply to increase the connection limit from the default of 100.

The default of 100 is easily hit when doing anything with blocking queries, since each blocking query requires a separate HTTP client connection. One example from my use-case: the Nomad Autoscaler monitors each job status using a blocking query, which means that any autoscaler agent monitoring more than 100 jobs will exceed this default limit in normal operation and will need to be increased.

One tradeoff in this PR is that the connection is kept open slightly longer as the agent writes out the 429 response. The HTTPConnStateFuncWithDefault429Handler function accepts a write-deadline argument, which I've set as 0 (no deadline) (updated: 10 * time.millisecond to match existing behavior in Consul) in this PR, this could alternatively be made configurable through an additional config option, if desired.

@hc-github-team-nomad-core hc-github-team-nomad-core force-pushed the backport/http_max_conn_limit_429/miserably-climbing-viper branch from 54914fb to bde1506 Compare August 23, 2022 18:30
@hc-github-team-nomad-core hc-github-team-nomad-core merged commit a746ef7 into release/1.3.x Aug 23, 2022
@hc-github-team-nomad-core hc-github-team-nomad-core deleted the backport/http_max_conn_limit_429/miserably-climbing-viper branch August 23, 2022 18:30
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants