-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ui, server: add node-specific timeouts for hot ranges requests #107627
Labels
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Comments
zachlite
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-cluster-observability
labels
Jul 26, 2023
zachlite
added a commit
to zachlite/cockroach
that referenced
this issue
Jul 28, 2023
Requests for hot ranges are serviced by a cluster wide fan-out, where non-trivial work is done on each node to provide a response. For each store, and for each hot range, we start a transaction with KV to look up descriptor info. Previously, there was no upper-bound set on the time a node could take to provide a response. This commit introduces a per-node timeout in the pagination logic, and is configurable with the new cluster setting server.hot_ranges.node.timeout. A value of 0 will disable the timeout. Error behavior and semantics are preserved. If a particular node times out, The fan-out continues as before, as if a node failed to provide a response. Informs cockroachdb#104269 Resolves cockroachdb#107627 Epic: none Release note (ops change): Added a new cluster setting named server.hot_ranges.node.timeout, with a default value of 5 minutes. The setting controls the maximum amount of time that a hot ranges request will spend waiting for a node to provide a response. Set to 0 to disable timeouts.
zachlite
added a commit
to zachlite/cockroach
that referenced
this issue
Jul 31, 2023
Requests for hot ranges are serviced by a cluster wide fan-out, where non-trivial work is done on each node to provide a response. For each store, and for each hot range, we start a transaction with KV to look up descriptor info. Previously, there was no upper-bound set on the time a node could take to provide a response. This commit introduces a per-node timeout in the pagination logic, and is configurable with the new cluster setting server.hot_ranges_request.node.timeout. A value of 0 will disable the timeout. Error behavior and semantics are preserved. If a particular node times out, The fan-out continues as before, as if a node failed to provide a response. Informs cockroachdb#104269 Resolves cockroachdb#107627 Epic: none Release note (ops change): Added a new cluster setting named server.hot_ranges_request.node.timeout, with a default value of 5 minutes. The setting controls the maximum amount of time that a hot ranges request will spend waiting for a node to provide a response. Set to 0 to disable timeouts.
craig bot
pushed a commit
that referenced
this issue
Aug 1, 2023
107796: ui, server: add a timeout per node while collecting hot ranges r=zachlite a=zachlite Requests for hot ranges are serviced by a cluster wide fan-out, where non-trivial work is done on each node to provide a response. For each store, and for each hot range, we start a transaction with KV to look up descriptor info. Previously, there was no upper-bound set on the time a node could take to provide a response. This commit introduces a per-node timeout in the pagination logic, and is configurable with the new cluster setting server.hot_ranges.node.timeout. A value of 0 will disable the timeout. Error behavior and semantics are preserved. If a particular node times out, The fan-out continues as before, as if a node failed to provide a response. Informs #104269 Resolves #107627 Epic: none Release note (ops change): Added a new cluster setting named server.hot_ranges.node.timeout, with a default value of 5 minutes. The setting controls the maximum amount of time that a hot ranges request will spend waiting for a node to provide a response. Set to 0 to disable timeouts. Co-authored-by: zachlite <zachlite@gmail.com>
zachlite
added a commit
to zachlite/cockroach
that referenced
this issue
Aug 18, 2023
Requests for hot ranges are serviced by a cluster wide fan-out, where non-trivial work is done on each node to provide a response. For each store, and for each hot range, we start a transaction with KV to look up descriptor info. Previously, there was no upper-bound set on the time a node could take to provide a response. This commit introduces a per-node timeout in the pagination logic, and is configurable with the new cluster setting server.hot_ranges_request.node.timeout. A value of 0 will disable the timeout. Error behavior and semantics are preserved. If a particular node times out, The fan-out continues as before, as if a node failed to provide a response. Informs cockroachdb#104269 Resolves cockroachdb#107627 Epic: none Release note (ops change): Added a new cluster setting named server.hot_ranges_request.node.timeout, with a default value of 5 minutes. The setting controls the maximum amount of time that a hot ranges request will spend waiting for a node to provide a response. Set to 0 to disable timeouts.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Part of #104269
During a fan out to collect hot ranges, we do not enforce a timeout while collecting hot ranges on a specific node.
As a result, the response time's lower bound is that of the node that takes the longest to collect hot ranges.
This should be wrapped in a
timeutil.RunWithTimeout
:cockroach/pkg/server/pagination.go
Line 382 in 9db3a0e
Jira issue: CRDB-30135
The text was updated successfully, but these errors were encountered: