-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sqlproxy: fix flakiness and other small issues #67452
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 7 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball, @chrisseto, @darinpp, and @jeffswenson)
pkg/ccl/sqlproxyccl/tenant/directory.proto, line 56 at r1 (raw file):
Pod pod = 1;
Note that we should only bump the version of sqlproxy in the CC codebase once tenantdir has been updated to reflect these changes as well. Changing the ordering is fine since both tenantdir and sqlproxy will start up at the same time.
pkg/ccl/sqlproxyccl/tenant/entry.go, line 82 at r1 (raw file):
Quoted 4 lines of code…
// If Initialize has already been successfully called, nothing to do. if e.initialized.Load() { return nil }
Hm, I wonder if it's cleaner to use sync.Once
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @chrisseto, @darinpp, @jaylim-crl, and @jeffswenson)
pkg/ccl/sqlproxyccl/tenant/directory.proto, line 56 at r1 (raw file):
Previously, jaylim-crl (Jay Lim) wrote…
Pod pod = 1;
Note that we should only bump the version of sqlproxy in the CC codebase once tenantdir has been updated to reflect these changes as well. Changing the ordering is fine since both tenantdir and sqlproxy will start up at the same time.
Ack.
pkg/ccl/sqlproxyccl/tenant/entry.go, line 82 at r1 (raw file):
Previously, jaylim-crl (Jay Lim) wrote…
// If Initialize has already been successfully called, nothing to do. if e.initialized.Load() { return nil }
Hm, I wonder if it's cleaner to use
sync.Once
here.
Yeah, it definitely is cleaner - I was just reimplementing it here. I switched to use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball, @chrisseto, @darinpp, and @jeffswenson)
pkg/ccl/sqlproxyccl/tenant/entry.go, line 112 at r1 (raw file):
func (e *tenantEntry) RefreshPods(ctx context.Context, client DirectoryClient) error { if !e.initialized.Load() { return errors.AssertionFailedf("entry for tenant %d is not initialized", e.TenantID)
Now that we've removed these assertions, what happens if we invoke RefreshPods / ChoosePodAddr whenever the entry hasn't been initialized?
pkg/ccl/sqlproxyccl/tenant/entry.go, line 80 at r2 (raw file):
// If Initialize has already been successfully called, nothing to do. e.initialized.Do(func() { tenantResp, err := client.GetTenant(ctx, &GetTenantRequest{TenantID: e.TenantID.ToUint64()})
Do we also need to synchronize access to to the Directory service through e.calls
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @chrisseto, @darinpp, @jaylim-crl, and @jeffswenson)
pkg/ccl/sqlproxyccl/tenant/entry.go, line 112 at r1 (raw file):
Previously, jaylim-crl (Jay Lim) wrote…
Now that we've removed these assertions, what happens if we invoke RefreshPods / ChoosePodAddr whenever the entry hasn't been initialized?
This was just here to catch bugs. But the bugs are quite unlikely and these have never fired. I don't think it's worth adding back an initialized
boolean just to catch this very unlikely bug.
pkg/ccl/sqlproxyccl/tenant/entry.go, line 80 at r2 (raw file):
Previously, jaylim-crl (Jay Lim) wrote…
Do we also need to synchronize access to to the Directory service through
e.calls
here?
No, because this is always called first before anything else. Any other threads will wait on the sync.Once
until this initial call is complete.
- Fix flakiness of the TestDirectoryConnect test. - Fix minor deleteEntry race condition. - Consolidate EventType and PodState enumerations. - Do not add DRAINING pod addresses to directory. - Avoid locking to check tenantEntry initialization. Fixes cockroachdb#67406 Release note: None
bors r+ |
Build succeeded: |
Fixes cockroachdb#69220. Regression from cockroachdb#67452. In cockroachdb#67452, we omitted DRAINING pods from the tenant directory. Whenever a pod goes into the DRAINING state, the pod watcher needs time to update the directory. Not waiting for that while calling EnsureTenantAddr produces a stale result. This commit updates TestWatchPods by polling on EnsureTenantAddr until that has been completed. Release note: None
73500: kv,storage: persist gateway node id in transaction intents r=AlexTalks a=AlexTalks This change augments the `TxnMeta` protobuf structure to include the gateway node ID (responsible for initiating the transaction) when serializing the intent. By doing so, this commit enables the Contention Event Store proposed in #71965, utilizing option 2. Release note: None 73862: sql: add test asserting CREATE/USAGE on public schema r=otan a=rafiss refs #70266 The public schema currently always has CREATE/USAGE privileges for the public role. Add a test that confirms this. Release note: None 73873: scdeps: tighten dependencies, log more side effects r=postamar a=postamar This commit reworks the dependency injection for the event logger, among other declarative schema changer dependencies. It also makes the test dependencies more chatty in the side effects log. Release note: None 73932: ui: select grants tab on table details page r=maryliag a=maryliag Previosuly, when the grants view was selected on the Database Details page, it was going to the Table Details with the Overview tab selected. With this commit, if the view mode selected is Grant, the grant tab is selected on the Table Details page. Fixes #68829 Release note: None 73943: cli: support --locality and --max-offset flags with sql tenant pods r=nvanbenschoten a=nvanbenschoten This commit adds support for the `--locality` and `--max-offset` flags to the `cockroach mt start-sql` command. The first of these is important because tenant SQL pods should know where they reside. This will be important in the future for multi-region serverless and also for projects like #72593. The second of these is important because the SQL pod's max-offset setting needs to be the same as the host cluster's. If we want to be able to configure the host cluster's maximum clock offset to some non-default value, we'll need SQL pods to be configured identically. Validation of plumbing: ```sh ./cockroach start-single-node --insecure --max-offset=250ms ./cockroach sql --insecure -e 'select crdb_internal.create_tenant(2)' # verify --max-offset ./cockroach mt start-sql --insecure --tenant-id=2 --sql-addr=:26258 --http-addr=:0 # CRDB crashes with error "locally configured maximum clock offset (250ms) does not match that of node [::]:62744 (500ms)" ./cockroach mt start-sql --insecure --tenant-id=2 --sql-addr=:26258 --http-addr=:0 --max-offset=250ms # successful # verify --locality ./cockroach sql --insecure --port=26258 -e 'select gateway_region()' ERROR: gateway_region(): no region set on the locality flag on this node ./cockroach mt start-sql --insecure --tenant-id=2 --sql-addr=:26258 --http-addr=:0 --max-offset=250ms --locality=region=us-east1 ./cockroach sql --insecure --port=26258 -e 'select gateway_region()' gateway_region ------------------ us-east1 ``` 73946: ccl/sqlproxyccl: fix TestWatchPods under stressrace r=jaylim-crl a=jaylim-crl Fixes #69220. Regression from #67452. In #67452, we omitted DRAINING pods from the tenant directory. Whenever a pod goes into the DRAINING state, the pod watcher needs time to update the directory. Not waiting for that while calling EnsureTenantAddr produces a stale result. This commit updates TestWatchPods by polling on EnsureTenantAddr until the pod watcher updates the directory. Release note: None 73954: sqlsmith: don't compare voids for joins r=rafiss a=otan No comparison expr is defined on voids, so don't generate comparisons for them. Resolves #73901 Resolves #73898 Resolves #73777 Release note: None Co-authored-by: Alex Sarkesian <sarkesian@cockroachlabs.com> Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com> Co-authored-by: Marius Posta <marius@cockroachlabs.com> Co-authored-by: Marylia Gutierrez <marylia@cockroachlabs.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Jay <jay@cockroachlabs.com> Co-authored-by: Oliver Tan <otan@cockroachlabs.com>
Fixes cockroachdb#69220. Regression from cockroachdb#67452. In cockroachdb#67452, we omitted DRAINING pods from the tenant directory. Whenever a pod goes into the DRAINING state, the pod watcher needs time to update the directory. Not waiting for that while calling EnsureTenantAddr produces a stale result. This commit updates TestWatchPods by polling on EnsureTenantAddr until that has been completed. Release note: None
Previously, cockroachdb#67452 removed DRAINING pods from the directory cache. This commit adds that back. The connector will now need to filter for RUNNING pods manually before invoking the balancer. This is needed so that we could track DRAINING pods, and wait until 60 seconds has elapsed before transferring connections away from them. To support that, we also update the Pod's proto definition to include a StateTimestamp field to reprevent that timestamp that the state field was last updated. The plan is to have a polling mechanism every X seconds to check DRAINING pods, and use that information to start migrating connections. Release note: None
Previously, cockroachdb#67452 removed DRAINING pods from the directory cache. This commit adds that back. The connector will now need to filter for RUNNING pods manually before invoking the balancer. This is needed so that we could track DRAINING pods, and wait until 60 seconds has elapsed before transferring connections away from them. To support that, we also update the Pod's proto definition to include a StateTimestamp field to reprevent that timestamp that the state field was last updated. The plan is to have a polling mechanism every X seconds to check DRAINING pods, and use that information to start migrating connections. Release note: None
…79409 #79427 #79428 #79433 #79444 76312: kvserver, batcheval: pin Engine state during read-only command evaluation r=aayushshah15 a=aayushshah15 This commit makes it such that we eagerly pin the engine state of the `Reader` created during the evaluation of read-only requests. Generally, reads will hold latches throughout the course of their evaluation (particularly, while they do their `MVCCScan`). Mainly, this commit paves the way for us to move to a world where we avoid holding latches during the MVCCScan. Additionally, it also lets us make MVCC garbage collection latchless as described in #55293. There are a few notable changes in this patch: 1. Pinning the engine state eagerly runs into #70974. To resolve this, the closed timestamp of the `Replica` is now captured at the time the `EvalContext` is created, and not during the command evaluation of `QueryResolvedTimestampRequest`. 2. `EvalContext` now has a `ImmutableEvalContext` embedded into it. The `ImmutableEvalContext` is supposed to encapsulate state that must not change after the `EvalContext` is created. The closed timestamp of the replica is part of the `ImmutableEvalContext`. 3. `Replica` no longer fully implements the `EvalContext` interface. Instead, it implements everything but `GetClosedTimestamp()` (which is implemented by `ImmutableEvalContext` instead). Relates to #55293 Resolves #55461 Resolves #70974 Release note: None 78652: sql: implement to_reg* builtins r=otan a=e-mbrown Resolves #77838 This commit implements the `to_regclass`, `to_regnamespace`, `to_regproc`, `to_regprocedure`, `to_regrole`, and `to_regtype` builtins. Release note (<category, see below>): The `to_regclass`, `to_regnamespace`, `to_regproc`, `to_regprocedure`, `to_regrole`, and `to_regtype` builtin functions are now supported, improving compatibility with PostgreSQL. 79022: server/status: add running non-idle jobs metric r=darinpp a=darinpp Previously serverless was using the sql jobs running metric to determine if a tenant process is idle and can be shut down. With the introduction of continiously running jobs this isn't a good indicator anymore. A recent addition is a per job metrics that show running or idle. The auto scaler doesn't care about the individual jobs and only cares about the total number of jobs that a running but haven't reported as being idle. The pull rate is also very high so the retriving all the individual running/idle metrics for each job type isn't optimal. So this PR adds a single metric that just aggregates and tracks the total count of jobs running and not idle. Release justification: Bug fixes and low-risk updates to new functionality Release note: None Will be re-based once #79021 is merged 79157: cli: tweak slow decommission message r=knz a=cameronnunez Release note: None 79313: opt: do not push LIMIT into the scan of a virtual table r=msirek a=msirek Fixes #78578 Previously, a LIMIT operation could be pushed into the scan of a virtual table with an ORDER BY clause. This was inadequate because in-order scans of virtual indexes aren't supported. When an index that should provide the order requested by a query is used, a sort is actually produced under the covers: ``` EXPLAIN(vec) SELECT oid, typname FROM pg_type ORDER BY OID; info ---------------------------------- │ └ Node 1 └ *colexec.sortOp └ *sql.planNodeToRowSource ``` Functions `CanLimitFilteredScan` and `GenerateLimitedScans` are modified to avoid pushing LIMIT operations into ordered scans of virtual indexes. Release justification: Low risk fix for incorrect results in queries involving virtual system tables. Release note (bug fix): LIMIT queries with an ORDER BY clause which scan the index of a virtual system tables, such as `pg_type`, could previously return incorrect results. This is corrected by teaching the optimizer that LIMIT operations cannot be pushed into ordered scans of virtual indexes. 79346: ccl/sqlproxyccl: add rebalancer queue for connection rebalancing r=JeffSwenson a=jaylim-crl #### ccl/sqlproxyccl: add rebalancer queue for rebalance requests This commit adds a rebalancer queue implementation to the balancer component. The queue will be used for rebalance requests for the connection migration work. This is done to ensure a centralized location that invokes the TransferConnection method on the connection handles. Doing this also enables us to limit the number of concurrent transfers within the proxy. Release note: None #### ccl/sqlproxyccl: run rebalancer queue processor in the background The previous commit added a rebalancer queue. This commit connects the queue to the balancer, and runs the queue processor in the background. By the default, we limit up to 100 concurrent transfers at any point in time, and each transfer will be retried up to 3 times. Release note: None Jira issue: CRDB-14727 79362: kv: remove stale comment in processOneChange r=nvanbenschoten a=nvanbenschoten The comment was added in 2fb56bd and hasn't been accurate since 5178559. Jira issue: CRDB-14753 79368: ccl/sqlproxyccl: include DRAINING pods in the directory cache r=JeffSwenson a=jaylim-crl Previously, #67452 removed DRAINING pods from the directory cache. This commit adds that back. The connector will now need to filter for RUNNING pods manually before invoking the balancer. This is needed so that we could track DRAINING pods, and wait until 60 seconds has elapsed before transferring connections away from them. To support that, we also update the Pod's proto definition to include a StateTimestamp field to reprevent that timestamp that the state field was last updated. The plan is to have a polling mechanism every X seconds to check DRAINING pods, and use that information to start migrating connections. Release note: None Jira issue: CRDB-14759 79386: colexec: remove redundant benchmarks r=yuzefovich a=yuzefovich This commit finishes the transition of some of the benchmarks in the colexec package started in 22.1 cycle. Fixes: #75106. Release note: None Jira issue: CRDB-14783 79409: sql: refactor deps tests to use bazel r=yuzefovich a=yuzefovich This commit refactors most `VerifyNoImports` dependency tests in the sql folder to use the newly introduced bazel test utilities. Release note: None Jira issue: CRDB-14814 79427: backupccl: allow cluster restore from different tenant r=dt a=stevendanna This removes a prohibition for cluster restores with mismatched tenant IDs since we believe they are now correct as of #73831 This allows users to take a cluster backup in a tenant and restore it into another tenant. The new tenant_settings table needs special care since it may exist in the source tenant but not the target tenant when the source tenant is the system tenant. In this change, we throw an error in the case of a non-empty tenant_settings table being restored into a non-system tenant. This is a bit user-unfriendly since we detect this error rather late in the restore process. Release note: None Jira issue: CRDB-14844 79428: backupccl: Refactor encryption utility functions into their own file. r=benbardin a=benbardin Release note: None Jira issue: CRDB-14845 79433: sql: use new ALTER TENANT syntax in tests r=stetvendanna a=rafiss Release note: None 79444: roachtest: warmup follower-reads for fixed duration, not fixed number of ops r=nvanbenschoten a=nvanbenschoten Fixes #78596. This change switches the warmup phase of the follower-read roachtest suite from running a fixed number of operations (100) to running for a fixed duration (15s). This should ensure that the single-region variant of the test is given sufficient time to warm up follower reads immediately after one of its nodes is restarted. Before this change, the single-region variant was only being given about 500ms after startup to catch up on the closed timestamp, which made the test flaky. Release justification: testing only Co-authored-by: Aayush Shah <aayush.shah15@gmail.com> Co-authored-by: e-mbrown <ebsonari@gmail.com> Co-authored-by: Darin Peshev <darinp@gmail.com> Co-authored-by: Cameron Nunez <cameron@cockroachlabs.com> Co-authored-by: Mark Sirek <sirek@cockroachlabs.com> Co-authored-by: Jay <jay@cockroachlabs.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: Steven Danna <danna@cockroachlabs.com> Co-authored-by: Ben Bardin <bardin@cockroachlabs.com> Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
Previously, #67452 removed DRAINING pods from the directory cache. This commit adds that back. The connector will now need to filter for RUNNING pods manually before invoking the balancer. This is needed so that we could track DRAINING pods, and wait until 60 seconds has elapsed before transferring connections away from them. To support that, we also update the Pod's proto definition to include a StateTimestamp field to reprevent that timestamp that the state field was last updated. The plan is to have a polling mechanism every X seconds to check DRAINING pods, and use that information to start migrating connections. Release note: None
Fixes #67406
Release note: None