Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ui, server: add node-specific timeouts for hot ranges requests #107627

Closed
zachlite opened this issue Jul 26, 2023 · 0 comments · Fixed by #107796
Closed

ui, server: add node-specific timeouts for hot ranges requests #107627

zachlite opened this issue Jul 26, 2023 · 0 comments · Fixed by #107796
Assignees
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@zachlite
Copy link
Contributor

zachlite commented Jul 26, 2023

Part of #104269

During a fan out to collect hot ranges, we do not enforce a timeout while collecting hot ranges on a specific node.

As a result, the response time's lower bound is that of the node that takes the longest to collect hot ranges.

This should be wrapped in a timeutil.RunWithTimeout:

res, err := r.nodeFn(ctx, client, nodeID)

Jira issue: CRDB-30135

@zachlite zachlite added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-cluster-observability labels Jul 26, 2023
zachlite added a commit to zachlite/cockroach that referenced this issue Jul 28, 2023
Requests for hot ranges are serviced by a cluster wide fan-out,
where non-trivial work is done on each node to provide a response.
For each store, and for each hot range, we start a transaction with KV to look
up descriptor info.

Previously, there was no upper-bound set on the time a node could take
to provide a response. This commit introduces a per-node timeout
in the pagination logic, and is configurable with the new cluster setting
server.hot_ranges.node.timeout. A value of 0 will disable the timeout.

Error behavior and semantics are preserved. If a particular node times out,
The fan-out continues as before, as if a node failed to provide a response.

Informs cockroachdb#104269
Resolves cockroachdb#107627
Epic: none
Release note (ops change): Added a new cluster setting named
server.hot_ranges.node.timeout, with a default value of 5 minutes.
The setting controls the maximum amount of time that a hot ranges request
will spend waiting for a node to provide a response.
Set to 0 to disable timeouts.
zachlite added a commit to zachlite/cockroach that referenced this issue Jul 31, 2023
Requests for hot ranges are serviced by a cluster wide fan-out,
where non-trivial work is done on each node to provide a response.
For each store, and for each hot range, we start a transaction with KV to look
up descriptor info.

Previously, there was no upper-bound set on the time a node could take
to provide a response. This commit introduces a per-node timeout
in the pagination logic, and is configurable with the new cluster setting
server.hot_ranges_request.node.timeout. A value of 0 will disable the timeout.

Error behavior and semantics are preserved. If a particular node times out,
The fan-out continues as before, as if a node failed to provide a response.

Informs cockroachdb#104269
Resolves cockroachdb#107627
Epic: none
Release note (ops change): Added a new cluster setting named
server.hot_ranges_request.node.timeout, with a default value of 5 minutes.
The setting controls the maximum amount of time that a hot ranges request
will spend waiting for a node to provide a response.
Set to 0 to disable timeouts.
craig bot pushed a commit that referenced this issue Aug 1, 2023
107796: ui, server: add a timeout per node while collecting hot ranges r=zachlite a=zachlite

Requests for hot ranges are serviced by a cluster wide fan-out, where non-trivial work is done on each node to provide a response. For each store, and for each hot range, we start a transaction with KV to look up descriptor info.

Previously, there was no upper-bound set on the time a node could take to provide a response. This commit introduces a per-node timeout in the pagination logic, and is configurable with the new cluster setting server.hot_ranges.node.timeout. A value of 0 will disable the timeout.

Error behavior and semantics are preserved. If a particular node times out, The fan-out continues as before, as if a node failed to provide a response.

Informs #104269
Resolves #107627 
Epic: none
Release note (ops change): Added a new cluster setting named server.hot_ranges.node.timeout, with a default value of 5 minutes. The setting controls the maximum amount of time that a hot ranges request will spend waiting for a node to provide a response. Set to 0 to disable timeouts.

Co-authored-by: zachlite <zachlite@gmail.com>
@craig craig bot closed this as completed in 472d414 Aug 1, 2023
zachlite added a commit to zachlite/cockroach that referenced this issue Aug 18, 2023
Requests for hot ranges are serviced by a cluster wide fan-out,
where non-trivial work is done on each node to provide a response.
For each store, and for each hot range, we start a transaction with KV to look
up descriptor info.

Previously, there was no upper-bound set on the time a node could take
to provide a response. This commit introduces a per-node timeout
in the pagination logic, and is configurable with the new cluster setting
server.hot_ranges_request.node.timeout. A value of 0 will disable the timeout.

Error behavior and semantics are preserved. If a particular node times out,
The fan-out continues as before, as if a node failed to provide a response.

Informs cockroachdb#104269
Resolves cockroachdb#107627
Epic: none
Release note (ops change): Added a new cluster setting named
server.hot_ranges_request.node.timeout, with a default value of 5 minutes.
The setting controls the maximum amount of time that a hot ranges request
will spend waiting for a node to provide a response.
Set to 0 to disable timeouts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant