Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
71542: backupccl: Support RESTORE SYSTEM USERS from a backup r=gh-casper a=gh-casper Support a new variant of RESTORE that recreates system users that don't exist in current cluster from a backup that contains system.users and also grant roles for these users. Example invocation: RESTORE SYSTEM USERS FROM 'nodelocal://foo/1'; Similar with full cluster restore, we firstly restore a temp system database which contains system.users and system.role_members into the restoring cluster and insert users and roles into the current system table from the temp system table. Fixes: #45358 Release note (sql change): A special flavor of RESTORE, RESTORE SYSTEM USERS FROM ..., is added to support restoring system users from a backup. When executed, the statement recreates those users which are in a backup of system.users but do not currently exist (ignoring those who do) and re-grant roles for users if the backup contains system.role_members. 73319: jobs: Execute scheduled jobs on a single node in the cluster. r=miretskiy a=miretskiy Execute scheduled jobs daemon on a single node -- namely, the lease holder for meta1 range lease holder. Prior to this change, scheduling daemon was running on each node, polling scheduled jobs table periodically with a `FOR UPDATE` clause. Unfortunately, job planning phase (namely, the backup planning phase) could take significant amount of time. In such situation, the entirety of the scheduled jobs table would be locked, resulting in inability to introspect the state of schedules (or jobs) via `SHOW SCHEDULES` or similar statements. Furthermore, dropping `FOR UPDATE` clause by itself is not ideal because that would lead to expensive backup planning being executed on almost every node, with all but 1 node making progress. The single node mode is disabled by default, but can be enabled via a `jobs.scheduler.single_node_scheduler.enabled` setting. Release Notes: scheduled jobs scheduler now runs on a single node by default in order to reduce contention on scheduled jobs table. 74077: kvserver: lease transfer in JOINT configuration r=shralex a=shralex Previously: 1. Removing a leaseholder was not allowed. 2. A VOTER_INCOMING node wasn't able to accept the lease. Because of (1), users needed to transfer the lease before removing the leaseholder. Because of (2), when relocating a range from the leaseholder A to a new node B, there was no possibility to transfer the lease to B before it was fully added as VOTER. Adding it as a voter, however, could degrade fault tolerance. For example, if A and B are in region R1, C in region R2 and D in R3, and we had (A, C, D), and now adding B to the cluster to replace A results in the intermediate configuration (A, B, C, D) the failure of R1 would make the cluster unavailable since no quorum can be established. Since B can't be added before A is removed, the system would transfer the lease out to C, remove A and add B, and then transfer the lease again to B. This resulted a temporary migration of leases out of their preferred region, imbalance of lease count and degraded performance. The PR fixes this, by (1) allowing removing the leaseholder, and transferring the lease right before we exit the JOINT config. And (2), allowing a VOTER_INCOMING to accept the lease. Release note (performance improvement): Fixes a limitation which meant that, upon adding a new node to the cluster, lease counts among existing nodes could diverge until the new node was fully upreplicated. Here are a few experiments that demonstrate the benefit of the feature. 1. > roachprod create local -n 4 // if not already created and staged > roachprod put local cockroach > roachprod start local:1-3 --racks=3 // add 3 servers in 3 different racks > cockroach workload init kv --splits=10000 > roachprod start local:4 --racks=3 // add a 4th server in one of the racks Without the change (master): <img width="978" alt="Screen Shot 2022-02-09 at 8 35 35 AM" src="https://user-images.githubusercontent.com/6037719/153458966-609dbb7e-ca3d-4db6-9cfb-adc228f2bdf2.png"> With the change: <img width="986" alt="Screen Shot 2022-02-08 at 8 46 41 PM" src="https://user-images.githubusercontent.com/6037719/153459366-2d4e2def-37cf-405b-b601-8be57419ae02.png"> We can see that without the patch the number of leases on server 0 (black line) goes all the way to 0 before it goes back up and that the number of leases in other racks goes up, both undesirable. With the patch both things are no longer happening. 2. Same as 1, but with a leaseholder preference of rack 0: ALTER RANGE default CONFIGURE ZONE USING lease_preferences='[[+rack=0]]'; Without the change (master): <img width="966" alt="Screen Shot 2022-02-09 at 10 45 27 PM" src="https://user-images.githubusercontent.com/6037719/153460753-bce048f0-f6da-4e21-afdc-317620c035b2.png"> With the change: <img width="983" alt="leaseholder preferences - with change" src="https://user-images.githubusercontent.com/6037719/153460780-55795866-cf47-404d-b77a-45d9e011f972.png"> We can see that without the change the number of leaseholders in racks 1 and 2 together (not in preferred region) grows from 300 to 1000, then goes back to 40. With the fix it doesn’t grow at all. 76401: pgwire: add server.max_connections public cluster setting r=rafiss a=ecwall This setting specifies a maximum number of connections that a server can have open at any given time. <0 - Connections are unlimited (existing behavior) =0 - Connections are disabled >0 - Connections are limited If a new non-superuser connection would exceed this limit, the same error message is returned as postgres: "sorry, too many connections" with the 53300 error code that corresponds to "too many connections". Release note (ops change): An off-by-default server.max_connections cluster setting has been added to limit the maximum number of connections to a server. 76748: sql: add missing specs to plan diagrams r=rharding6373 a=rharding6373 This change allows missing specs (e.g., RestoreDataSpec and SplitAndScatterSpec) to be shown in plan diagrams. Before this change a plan involving these types would result in an error generating the diagrams. Also added a test to make sure future specs implement the `diagramCellType` interface, which is required to generate diagrams. Release note: None Co-authored-by: Casper <casper@cockroachlabs.com> Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com> Co-authored-by: shralex <shralex@gmail.com> Co-authored-by: Evan Wall <wall@cockroachlabs.com> Co-authored-by: rharding6373 <rharding6373@users.noreply.github.com>
- Loading branch information