-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ccl/sqlproxyccl: add connector component and support for session revival token #76417
ccl/sqlproxyccl: add connector component and support for session revival token #76417
Conversation
cd7fc97
to
64b2df5
Compare
64b2df5
to
082e83b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jaylim-crl, @jeffswenson, and @rafiss)
pkg/ccl/sqlproxyccl/connector.go, line 53 at r1 (raw file):
// startupMsg represents the startup message associated with the client. // This will be used when establishing a pgwire connection with the SQL pod. startupMsg *pgproto3.StartupMessage
NIT: I typically will capitalize fields that I expect callers to set, vs. fields that are meant to be internal to the connector
(i.e. only read/written by its own methods). This makes it easier to understand which fields callers might need to initialize when construction connector
.
Regardless, I think we need comments or some mechanism to know which of these fields are required and which are optional. Maybe just a comment like "This field is required."
pkg/ccl/sqlproxyccl/connector.go, line 87 at r1 (raw file):
idleMonitorWrapperFn func(crdbConn net.Conn) net.Conn // Optional event callback functions. onLookupEvent and onDialEvent will be
NIT: Are we going too far here with this "ultra-extensible" set of callback functions? This design dates back, but I question whether we're getting much value for the extra complications it brings. Should we be inlining much/most of what these callback functions are doing?
082e83b
to
6e82679
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @jeffswenson, and @rafiss)
pkg/ccl/sqlproxyccl/connector.go, line 53 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
NIT: I typically will capitalize fields that I expect callers to set, vs. fields that are meant to be internal to the
connector
(i.e. only read/written by its own methods). This makes it easier to understand which fields callers might need to initialize when constructionconnector
.Regardless, I think we need comments or some mechanism to know which of these fields are required and which are optional. Maybe just a comment like "This field is required."
Done. All of these fields are expected to be set by the caller, so I've capitalized all of them. Also added a comment for those.
pkg/ccl/sqlproxyccl/connector.go, line 87 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
NIT: Are we going too far here with this "ultra-extensible" set of callback functions? This design dates back, but I question whether we're getting much value for the extra complications it brings. Should we be inlining much/most of what these callback functions are doing?
I did this originally to avoid those logging calls within the connector component. Maybe that's unavoidable if we wanted to push everything into the connector. That said, I think removing the event callback functions are fine. We could inline everything, but I'd do this in a future PR. I'd like to push outgoingAddress
into the connector, and unify this with the lookup function. The fallback onto routing rule isn't needed anymore, but we still need to support the static routing rule since a lot of tests rely on that. I added a TODO here.
One of the goals with this PR is to ensure that the behavior of SQL proxy remains unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @jaylim-crl, and @rafiss)
pkg/ccl/sqlproxyccl/connector.go, line 87 at r1 (raw file):
Previously, jaylim-crl (Jay Lim) wrote…
I did this originally to avoid those logging calls within the connector component. Maybe that's unavoidable if we wanted to push everything into the connector. That said, I think removing the event callback functions are fine. We could inline everything, but I'd do this in a future PR. I'd like to push
outgoingAddress
into the connector, and unify this with the lookup function. The fallback onto routing rule isn't needed anymore, but we still need to support the static routing rule since a lot of tests rely on that. I added a TODO here.One of the goals with this PR is to ensure that the behavior of SQL proxy remains unchanged.
+1 to replacing the indirection with direct calls and pulling the logging and metrics into the connector. I think it is fine to accept the things that are already callbacks, e.g. the testingKnobs struct. The new callbacks are the ones I would prefer to see removed from the PR. E.g AddrLookupFn, IdleMonitorWrapperFn, AuthenticateFn, OnLookupEvent, and OnDialEvent.
6e82679
to
9d50ba0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR! Addressed all the reviews. This PR is ready for another round of review.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @rafiss)
pkg/ccl/sqlproxyccl/connector.go, line 87 at r1 (raw file):
Previously, JeffSwenson (Jeff Swenson) wrote…
+1 to replacing the indirection with direct calls and pulling the logging and metrics into the connector. I think it is fine to accept the things that are already callbacks, e.g. the testingKnobs struct. The new callbacks are the ones I would prefer to see removed from the PR. E.g AddrLookupFn, IdleMonitorWrapperFn, AuthenticateFn, OnLookupEvent, and OnDialEvent.
Done. I backed out most of the changes except IdleMonitorWrapperFn
, which was also discussed offline. We can't move that in because the existing monitor takes a callback which contains a reference to errConnection
(and I didn't want to move that channel into the connector):
cockroach/pkg/ccl/sqlproxyccl/proxy_handler.go
Lines 396 to 402 in 07a3683
crdbConn = handler.idleMonitor.DetectIdle(crdbConn, func() { | |
err := newErrorf(codeIdleDisconnect, "idle connection closed") | |
select { | |
case errConnection <- err: /* error reported */ | |
default: /* the channel already contains an error */ | |
} | |
}) |
As for metrics, I don't think we should do that right now. I'll go through the proxy once the connection migration stuff is in to start collecting metrics for items that we care about. I suspect there are some unused/not useful metrics at the moment as well. I moved the logging stuff into the connector though.
9d50ba0
to
87935f0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I like the direction you are taking the sqlproxy. proxy_handler was a bit much. Pulling logic out into the connector and forwarder is a big improvement.
1d12cee
to
61308d1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @jeffswenson, and @rafiss)
pkg/ccl/sqlproxyccl/connector.go, line 197 at r3 (raw file):
Previously, JeffSwenson (Jeff Swenson) wrote…
nit: consider naming openClusterConnInternal "dialTenantCluster".
Done.
pkg/ccl/sqlproxyccl/connector.go, line 278 at r3 (raw file):
The latter would be tricky to test I think. I went with the first approach. Did not log here because we're already logging in the caller.
avoid panics in the data plane code
I have mixed feelings about this. Back then when I returned errors, people told me they preferred panics especially when a case is impossible; it's better to crash right away than letting it go as an error to the caller. I personally prefer the panic approach, but I've changed this anyway. If you grep for panic("unreachable")
, you'll find some in the kv or sql layer.
pkg/ccl/sqlproxyccl/connector.go, line 334 at r3 (raw file):
Previously, JeffSwenson (Jeff Swenson) wrote…
nit: consider naming dialOutgoingAddr "dialSqlServer(serverAddr string)"
Done.
861cab7
to
a7b89ee
Compare
…val token Informs cockroachdb#76000. Previously, all the connection establishment logic is coupled with the handler function within proxy_handler.go. This makes connecting to a new SQL pod during connection migration difficult. This commit refactors all of those connection logic out of the proxy handler into a connector component, as described in the connection migration RFC. At the same time, we also add support for the session revival token within this connector component. Note that the overall behavior of the SQL proxy should be unchanged with this commit. Release note: None
a7b89ee
to
21ee3bd
Compare
bors r=JeffSwenson |
73586: rfc: optimize the draining process with connection_wait and relevant reporting systems r=ZhouXing19 a=ZhouXing19 This is a proposal for optimizing the draining process to be more legible for customers, and introduce a new step, connection_wait, to the draining process, which allows the server to early exit when all connections are closed. Release note: None 76410: kv: disallow GC requests that bump GC threshold and GC expired versions r=nvanbenschoten a=nvanbenschoten Related to #55293. This commit adds a safeguard to GC requests that prevents them from bumping the GC threshold at the same time that they GC individual MVCC versions. This was found to be unsafe in #55293 because performing both of these actions at the same time could lead to a race where a read request is allowed to evaluate without error while also failing to see MVCC versions that are concurrently GCed. This race is possible because foreground traffic consults the in-memory version of the GC threshold (`r.mu.state.GCThreshold`), which is updated after (in `handleGCThresholdResult`), not atomically with, the application of the GC request's WriteBatch to the LSM (in `ApplyToStateMachine`). This allows a read request to see the effect of a GC on MVCC state without seeing its effect on the in-memory GC threshold. The latches acquired by GC quests look like it will help with this race, but in practice they do not for two reasons: 1. the latches do not protect timestamps below the GC request's batch timestamp. This means that they only conflict with concurrent writes, but not all concurrent reads. 2. the read could be served off a follower, which could be applying the GC request's effect from the raft log. Latches held on the leaseholder would have no impact on a follower read. Thankfully, the GC queue has split these two steps for the past few releases, at least since 87e85eb, so we do not have a bug today. The commit also adds a test that reliably exercises the bug with a few well-placed calls to `time.Sleep`. The test contains a variant where the read is performed on the leaseholder and a variant where it is performed on a follower. Both fail by default. If we switch the GC request to acquire non-MVCC latches then the leaseholder variant passes, but the follower read variant still fails. 76417: ccl/sqlproxyccl: add connector component and support for session revival token r=JeffSwenson a=jaylim-crl Informs #76000. Previously, all the connection establishment logic is coupled with the handler function within proxy_handler.go. This makes connecting to a new SQL pod during connection migration difficult. This commit refactors all of those connection logic out of the proxy handler into a connector component, as described in the connection migration RFC. At the same time, we also add support for the session revival token within this connector component. Note that the overall behavior of the SQL proxy should be unchanged with this commit. Release note: None 76545: cmd/reduce: add -tlp option r=yuzefovich a=yuzefovich **cmd/reduce: remove stdin option and require -file argument** We tend to not use the option of passing input SQL via stdin, so this commit removes it. An additional argument in favor of doing that is that the follow-up commit will introduce another mode of behavior that requires `-file` argument to be specified, so it's just cleaner to always require it now. Release note: None **cmd/reduce: add -tlp option** This commit adds `-tlp` boolean flag that changes the behavior of `reduce`. It is required that `-file` is specified whenever the `-tlp` flag is used. The behavior is such that the last two queries (delimited by empty lines) in the file contain unpartitioned and partitioned queries that return different results although they are equivalent. If TLP check is requested, then we remove the last two queries from the input which we use then to construct a special TLP check query that results in an error if two removed queries return different results. We do not just include the TLP check query into the input string because the reducer would then reduce the check query itself, making the reduction meaningless. Release note: None 76598: server: use channel for DisableAutomaticVersionUpgrade r=RaduBerinde a=RaduBerinde DisableAutomaticVersionUpgrade is an atomic integer which is rechecked in a retry loop. This is not a very clean mechanism, and can lead to issues where you're unknowingly dealing with a copy of the knobs and setting the wrong atomic. The retry loop can also add unnecessary delays in tests. This commit changes DisableAutomaticVersionUpgrade from an atomic integer to a channel. If the channel is set, auto-upgrade waits until the channel is closed. Release note: None Co-authored-by: Jane Xing <zhouxing@uchicago.edu> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Jay <jay@cockroachlabs.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
Build failed (retrying...): |
Build succeeded: |
Informs cockroachdb#76000. Extracted from cockroachdb#76805. Previously, we assumed that with the token-based authentication, the server is ready to accept queries the moment we connect to it. This assumption has been proved wrong during the integration tests with the forwarder, and there's an implicit AuthenticateOK step that happens after connecting to the server. During that time, initial connection data such as ParameterStatus and BackendKeyData messages will be sent to the client as well. For now, we will ignore those messages. Once we start implementing query cancellation within the proxy, that has to be updated to cache the new BackendKeyData. Release note: None Release justification: This fixes a bug that was introduced when we added token-based authentication support to the proxy in cockroachdb#76417. This is low risk, as the code is guarded behind the connection migration feature, which is currently not being used in production. To add on, CockroachCloud is the only user of sqlproxy.
Informs cockroachdb#76000. Extracted from cockroachdb#76805. Previously, we assumed that with the token-based authentication, the server is ready to accept queries the moment we connect to it. This assumption has been proved wrong during the integration tests with the forwarder, and there's an implicit AuthenticateOK step that happens after connecting to the server. During that time, initial connection data such as ParameterStatus and BackendKeyData messages will be sent to the client as well. For now, we will ignore those messages. Once we start implementing query cancellation within the proxy, that has to be updated to cache the new BackendKeyData. This commit also fixes a buglet to handle pgwire messages with no body. pgproto3's Receive methods will still call Next if the body size is 0, and previously, we were returning an io.EOF error. This commit changes that behavior to return an empty slice. Release note: None Release justification: This fixes two buglets: one that was introduced when we added token-based authentication support to the proxy in cockroachdb#76417, and another when we added the interceptors. This is low risk as part of the code is guarded behind the connection migration feature, which is currently not being used in production. To add on, CockroachCloud is the only user of sqlproxy.
Informs cockroachdb#76000. Extracted from cockroachdb#76805. Previously, we assumed that with the token-based authentication, the server is ready to accept queries the moment we connect to it. This assumption has been proved wrong during the integration tests with the forwarder, and there's an implicit AuthenticateOK step that happens after connecting to the server. During that time, initial connection data such as ParameterStatus and BackendKeyData messages will be sent to the client as well. For now, we will ignore those messages. Once we start implementing query cancellation within the proxy, that has to be updated to cache the new BackendKeyData. This commit also fixes a buglet to handle pgwire messages with no body. pgproto3's Receive methods will still call Next if the body size is 0, and previously, we were returning an io.EOF error. This commit changes that behavior to return an empty slice. Release note: None Release justification: This fixes two buglets: one that was introduced when we added token-based authentication support to the proxy in cockroachdb#76417, and another when we added the interceptors. This is low risk as part of the code is guarded behind the connection migration feature, which is currently not being used in production. To add on, CockroachCloud is the only user of sqlproxy.
77111: ccl/sqlproxyccl: handle implicit auth in OpenTenantConnWithToken r=JeffSwenson a=jaylim-crl Informs #76000. Extracted from #76805. Previously, we assumed that with the token-based authentication, the server is ready to accept queries the moment we connect to it. This assumption has been proved wrong during the integration tests with the forwarder, and there's an implicit AuthenticateOK step that happens after connecting to the server. During that time, initial connection data such as ParameterStatus and BackendKeyData messages will be sent to the client as well. For now, we will ignore those messages. Once we start implementing query cancellation within the proxy, that has to be updated to cache the new BackendKeyData. This commit also fixes a buglet to handle pgwire messages with no body. pgproto3's Receive methods will still call Next if the body size is 0, and previously, we were returning an io.EOF error. This commit changes that behavior to return an empty slice. Release note: None Release justification: This fixes two buglets: one that was introduced when we added token-based authentication support to the proxy in #76417, and another when we added the interceptors. This is low risk as part of the code is guarded behind the connection migration feature, which is currently not being used in production. To add on, CockroachCloud is the only user of sqlproxy. 77176: logictest: fix flakey alter_table r=ajwerner a=ajwerner The GC job for the temp index runs immediately so the fraction might be 1. Fixes #76911. Release justification: non-production code change Release note: None Co-authored-by: Jay <jay@cockroachlabs.com> Co-authored-by: Andrew Werner <awerner32@gmail.com>
Informs cockroachdb#76000. Extracted from cockroachdb#76805. Previously, we assumed that with the token-based authentication, the server is ready to accept queries the moment we connect to it. This assumption has been proved wrong during the integration tests with the forwarder, and there's an implicit AuthenticateOK step that happens after connecting to the server. During that time, initial connection data such as ParameterStatus and BackendKeyData messages will be sent to the client as well. For now, we will ignore those messages. Once we start implementing query cancellation within the proxy, that has to be updated to cache the new BackendKeyData. This commit also fixes a buglet to handle pgwire messages with no body. pgproto3's Receive methods will still call Next if the body size is 0, and previously, we were returning an io.EOF error. This commit changes that behavior to return an empty slice. Release note: None Release justification: This fixes two buglets: one that was introduced when we added token-based authentication support to the proxy in cockroachdb#76417, and another when we added the interceptors. This is low risk as part of the code is guarded behind the connection migration feature, which is currently not being used in production. To add on, CockroachCloud is the only user of sqlproxy.
Informs #76000.
Previously, all the connection establishment logic is coupled with the handler
function within proxy_handler.go. This makes connecting to a new SQL pod during
connection migration difficult. This commit refactors all of those connection
logic out of the proxy handler into a connector component, as described in the
connection migration RFC. At the same time, we also add support for the session
revival token within this connector component.
Note that the overall behavior of the SQL proxy should be unchanged with this
commit.
Release note: None