kvserver: account for no node id in leaseholder stats #85157

kvoli · 2022-07-27T19:37:47Z

Problem

Currently replica_send, which is the point at where load is accounted, then used in load based rebalancing, does not record QPS when GatewayNodeID=0 (default value).

cockroach/pkg/kv/kvserver/replica_send.go

Lines 140 to 142 in 4df883b

    
           if r.leaseholderStats != nil && ba.Header.GatewayNodeID != 0 { 
        
           	r.leaseholderStats.RecordCount(r.getBatchRequestQPS(ctx, ba), ba.Header.GatewayNodeID) 
        
           }

This means that dist_sender requests that do not set a GatewayNodeID, will not be accounted for and not considered as load on the receiving store, despite still using resources.

A secondary tenant will currently never populate this value

cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go

Lines 564 to 568 in cc155a1

    
           // TODO(nvanbenschoten): open an issue about the effect of this. 
        
           g, ok := ds.nodeDescs.(*gossip.Gossip) 
        
           if !ok { 
        
           	return nil 
        
           }

Solution

The solution is to account for QPS, when GatewayNodeID=0 regardless and then modify the semantics of the user of the locality based load counts to not consider these "untagged" load counts.

cockroach/pkg/kv/kvserver/allocator/allocatorimpl/allocator.go

Line 1980 in d7b901d

for requestLocalityStr, qps := range qpsStats {

Jira issue: CRDB-18068

The text was updated successfully, but these errors were encountered:

kvoli · 2022-07-27T19:58:05Z

related #81000

Previously, batch requests with no `GatewayNodeID` would not be accounted for on the QPS of a replica. By extension, the store QPS would also not aggregate this missing QPS over replicas it holds. This patch introduces tracking for all requests, regardless of the `GatewayNodeID`. This was done to as follow the workload lease transfers consider the per-locality counts, therefore untagged localities were not useful. This has since been updated to ignore filter out localities directly, so it is not necessary to exclude them anymore. `leaseholderStats`, which previously tracked the QPS, and `writeStats` tracking the mvcc keys written, have also been removed. They are duplicated in `batchRequest` and `writeKeys` respectively, within the `loadStats` of a replica. resolves cockroachdb#85157 Release note: None

84865: kvserver: always return NLHE on lease acquisition timeouts r=nvanbenschoten a=erikgrinaker In ab74b97 we added internal timeouts for lease acquisitions. These were wrapped in `RunWithTimeout()`, as mandated for context timeouts. However, this would mask the returned `NotLeaseHolderError` as a `TimeoutError`, preventing the DistSender from retrying it and instead propagating it out to the client. Additionally, context cancellation errors from the actual RPC call were never wrapped as a `NotLeaseHolderError` in the first place. This ended up only happening in a very specific scenario where the outer timeout added to the client context did not trigger, but the inner timeout for the coalesced request context did trigger while the lease request was in flight. Accidentally, the outer `RunWithTimeout()` call did not return the `roachpb.Error` from the closure but instead passed it via a captured variable, bypassing the error wrapping. This patch replaces the `RunWithTimeout()` calls with regular `context.WithTimeout()` calls to avoid the error wrapping, and returns a `NotLeaseHolderError` from `requestLease()` if the RPC request fails and the context was cancelled (presumably causing the error). Another option would be to extract an NLHE from the error chain, but this would require correct propagation of the structured error chain across RPC boundaries, so out of an abundance of caution and with an eye towards backports, we instead choose to return a bare `NotLeaseHolderError`. The empty lease in the returned error prevents the DistSender from updating its caches on context cancellation. Resolves #84258. Resolves #85115. Release note (bug fix): Fixed a bug where clients could sometimes receive errors due to lease acquisition timeouts of the form `operation "storage.pendingLeaseRequest: requesting lease" timed out after 6s`. 84946: distsql: make the number of DistSQL runners dynamic r=yuzefovich a=yuzefovich **distsql: make the number of DistSQL runners dynamic** This commit improves the infrastructure around a pool of "DistSQL runners" that are used for issuing SetupFlow RPCs in parallel. Previously, we had a hard-coded number of 16 goroutines which was probably insufficient in many cases. This commit makes it so that we use the default value of `4 x N(cpus)` to make it proportional to how beefy the node is (under the expectation that the larger the node is, the more distributed queries it will be handling). The choice of the four as the multiple was made so that we get the previous default on machines with 4 CPUs. Additionally, this commit introduces a mechanism to dynamically adjust the number of runners based on a cluster setting. Whenever the setting is reduced, some of the workers are stopped, if the setting is increased, then new workers are spun up accordingly. This coordinator listens on two channels: one about the server quescing, and another about the new target pool size. Whenever a new target size is received, the coordinator will spin up / shut down one worker at a time until that target size is achieved. The worker, however, doesn't access the server quescing channel and, instead, relies on the coordinator to tell it to exit (either by closing the channel when quescing or sending a single message when the target size is decreased). Fixes: #84459. Release note: None **distsql: change the flow setup code a bit** Previously, when setting up a distributed plan, we would wait for all SetupFlow RPCs to come back before setting up the flow on the gateway. Most likely (in the happy scenario) all those RPCs would be successful, so we can parallelize the happy path a bit by setting up the local flow while the RPCs are in-flight which is what this commit does. This seems especially beneficial given the change in the previous commit to increase the number of DistSQL runners for beefy machines - we are now more likely to issue SetupFlow RPCs asynchronously. Release note: None 85091: flowinfra: disable queueing mechanism of the flow scheduler by default r=yuzefovich a=yuzefovich This commit disables the queueing mechanism of the flow scheduler as part of the effort to remove that queueing altogether during 23.1 release cycle. To get there though we choose a conservative approach of introducing a cluster setting that determines whether the queueing is enabled or not, and if it is disabled, then we effectively a treating `sql.distsql.max_running_flows` limit as infinite. By default, the queueing is now disabled since recent experiments have shown that the admission control does a good job of protecting the nodes from the influx of remote flows. Addresses: #34229. Release note: None 85134: sql: allow NULL in create view definition r=mgartner a=rafiss fixes #84000 Release note (sql change): CREATE VIEW statements can now have a constant NULL column definition. The resulting column is of type TEXT. 85178: kvserver: record batch requests with no gateway r=kvoli a=kvoli Previously, batch requests with no `GatewayNodeID` would not be accounted for on the QPS of a replica. By extension, the store QPS would also not aggregate this missing QPS over replicas it holds. This patch introduces tracking for all requests, regardless of the `GatewayNodeID`. This was done to as follow the workload lease transfers consider the per-locality counts, therefore untagged localities were not useful. This has since been updated to ignore filter out localities directly, so it is not necessary to exclude them anymore. `leaseholderStats`, which previously tracked the QPS, and `writeStats` tracking the mvcc keys written, have also been removed. They are duplicated in `batchRequest` and `writeKeys` respectively, within the `loadStats` of a replica. resolves #85157 Release note: None 85355: sql: improve physical planning of window functions r=yuzefovich a=yuzefovich **sql: remove shouldNotDistribute recommendation** It doesn't seem to be used much. Release note: None **sql: improve physical planning of window functions** This commit improves the physical planning of window functions in several ways. First, the optimizer is updated so that all window functions with a PARTITION BY clause are constructed first followed by the remaining window functions without PARTITION BY. This is needed by the execution which can only evaluate functions with PARTITION BY in the distributed fashion - as a result of this change, we are now more likely to get partial distributed execution (previously things depended on the order in which window functions were mentioned in the query). Second, the physical planner now thinks that we "should distribute" the plan if it finds at least one window function with PARTITION BY clause. Previously, we didn't make any recommendation about the distribution based on the presence of the window functions (i.e. we relied on the rest of the plan to do so), but they can be quite computation-intensive, so whenever we can distribute the execution, we should do so. Additionally, this commit removes some of the code in the physical planner which tries to find window functions with the same PARTITION BY and ORDER BY clauses - that code has been redundant for long time given that the optimizer does that too. Release note: None 85366: sql,logictest,descidgen: abstract descriptor ID generation, make deterministic in logictests r=ajwerner a=ajwerner The first commit adds an interface for descriptor ID generation and propagates the interface from the ExecCfg into the EvalContext. There are some minor refactoring to avoid propagating an ExecCfg further up the stack by making the parameters more specific. The second commit adds a testing knob to use a transactional implementation in the EvalContext. Fixes #37751 Fixes #69226 85406: schemachanger: check explain diagrams during rollback test r=postamar a=postamar This commit enriches the declarative schema changer integration tests by making data-driven EXPLAIN output assertions easier to add as a complement to otherwise unrelated tests. In particular, this commit improves the rollback test to check the explained rollback plan for each post-commit revertible stage. This should make it easier to debug bad rule definitions which otherwise would manifest themselves as causing the schema change to hang during the rollback. Release note: None 85414: colflow: fix a recent flake r=yuzefovich a=yuzefovich In 0866ddc we merged a change that relied on the assumption that the allocator passed to the parallel unordered synchronizer was not used by anyone else, but this assumption was broken in a test and is now fixed. Fixes: #85360. Release note: None Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com> Co-authored-by: Austen McClernon <austen@cockroachlabs.com> Co-authored-by: Andrew Werner <awerner32@gmail.com> Co-authored-by: Marius Posta <marius@cockroachlabs.com>

kvoli added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-kv-distribution Relating to rebalancing and leasing. A-kv Anything in KV that doesn't belong in a more specific category. labels Jul 27, 2022

blathers-crl bot added the T-kv KV Team label Jul 27, 2022

kvoli mentioned this issue Jul 28, 2022

kvserver: record batch requests with no gateway #85178

Merged

kvoli self-assigned this Jul 28, 2022

craig bot closed this as completed in 0fea845 Aug 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: account for no node id in leaseholder stats #85157

kvserver: account for no node id in leaseholder stats #85157

kvoli commented Jul 27, 2022 •

edited

Loading

kvoli commented Jul 27, 2022

kvserver: account for no node id in leaseholder stats #85157

kvserver: account for no node id in leaseholder stats #85157

Comments

kvoli commented Jul 27, 2022 • edited Loading

Problem

Solution

kvoli commented Jul 27, 2022

kvoli commented Jul 27, 2022 •

edited

Loading