Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ccl/sqlproxyccl: add connector component and support for session revival token #76417

Merged
merged 1 commit into from
Feb 15, 2022

Conversation

jaylim-crl
Copy link
Collaborator

Informs #76000.

Previously, all the connection establishment logic is coupled with the handler
function within proxy_handler.go. This makes connecting to a new SQL pod during
connection migration difficult. This commit refactors all of those connection
logic out of the proxy handler into a connector component, as described in the
connection migration RFC. At the same time, we also add support for the session
revival token within this connector component.

Note that the overall behavior of the SQL proxy should be unchanged with this
commit.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jaylim-crl, @jeffswenson, and @rafiss)


pkg/ccl/sqlproxyccl/connector.go, line 53 at r1 (raw file):

	// startupMsg represents the startup message associated with the client.
	// This will be used when establishing a pgwire connection with the SQL pod.
	startupMsg *pgproto3.StartupMessage

NIT: I typically will capitalize fields that I expect callers to set, vs. fields that are meant to be internal to the connector (i.e. only read/written by its own methods). This makes it easier to understand which fields callers might need to initialize when construction connector.

Regardless, I think we need comments or some mechanism to know which of these fields are required and which are optional. Maybe just a comment like "This field is required."


pkg/ccl/sqlproxyccl/connector.go, line 87 at r1 (raw file):

	idleMonitorWrapperFn func(crdbConn net.Conn) net.Conn

	// Optional event callback functions. onLookupEvent and onDialEvent will be

NIT: Are we going too far here with this "ultra-extensible" set of callback functions? This design dates back, but I question whether we're getting much value for the extra complications it brings. Should we be inlining much/most of what these callback functions are doing?

Copy link
Collaborator Author

@jaylim-crl jaylim-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @jeffswenson, and @rafiss)


pkg/ccl/sqlproxyccl/connector.go, line 53 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

NIT: I typically will capitalize fields that I expect callers to set, vs. fields that are meant to be internal to the connector (i.e. only read/written by its own methods). This makes it easier to understand which fields callers might need to initialize when construction connector.

Regardless, I think we need comments or some mechanism to know which of these fields are required and which are optional. Maybe just a comment like "This field is required."

Done. All of these fields are expected to be set by the caller, so I've capitalized all of them. Also added a comment for those.


pkg/ccl/sqlproxyccl/connector.go, line 87 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

NIT: Are we going too far here with this "ultra-extensible" set of callback functions? This design dates back, but I question whether we're getting much value for the extra complications it brings. Should we be inlining much/most of what these callback functions are doing?

I did this originally to avoid those logging calls within the connector component. Maybe that's unavoidable if we wanted to push everything into the connector. That said, I think removing the event callback functions are fine. We could inline everything, but I'd do this in a future PR. I'd like to push outgoingAddress into the connector, and unify this with the lookup function. The fallback onto routing rule isn't needed anymore, but we still need to support the static routing rule since a lot of tests rely on that. I added a TODO here.

One of the goals with this PR is to ensure that the behavior of SQL proxy remains unchanged.

Copy link
Collaborator

@jeffswenson jeffswenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @jaylim-crl, and @rafiss)


pkg/ccl/sqlproxyccl/connector.go, line 87 at r1 (raw file):

Previously, jaylim-crl (Jay Lim) wrote…

I did this originally to avoid those logging calls within the connector component. Maybe that's unavoidable if we wanted to push everything into the connector. That said, I think removing the event callback functions are fine. We could inline everything, but I'd do this in a future PR. I'd like to push outgoingAddress into the connector, and unify this with the lookup function. The fallback onto routing rule isn't needed anymore, but we still need to support the static routing rule since a lot of tests rely on that. I added a TODO here.

One of the goals with this PR is to ensure that the behavior of SQL proxy remains unchanged.

+1 to replacing the indirection with direct calls and pulling the logging and metrics into the connector. I think it is fine to accept the things that are already callbacks, e.g. the testingKnobs struct. The new callbacks are the ones I would prefer to see removed from the PR. E.g AddrLookupFn, IdleMonitorWrapperFn, AuthenticateFn, OnLookupEvent, and OnDialEvent.

Copy link
Collaborator Author

@jaylim-crl jaylim-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR! Addressed all the reviews. This PR is ready for another round of review.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @rafiss)


pkg/ccl/sqlproxyccl/connector.go, line 87 at r1 (raw file):

Previously, JeffSwenson (Jeff Swenson) wrote…

+1 to replacing the indirection with direct calls and pulling the logging and metrics into the connector. I think it is fine to accept the things that are already callbacks, e.g. the testingKnobs struct. The new callbacks are the ones I would prefer to see removed from the PR. E.g AddrLookupFn, IdleMonitorWrapperFn, AuthenticateFn, OnLookupEvent, and OnDialEvent.

Done. I backed out most of the changes except IdleMonitorWrapperFn, which was also discussed offline. We can't move that in because the existing monitor takes a callback which contains a reference to errConnection (and I didn't want to move that channel into the connector):

crdbConn = handler.idleMonitor.DetectIdle(crdbConn, func() {
err := newErrorf(codeIdleDisconnect, "idle connection closed")
select {
case errConnection <- err: /* error reported */
default: /* the channel already contains an error */
}
})

As for metrics, I don't think we should do that right now. I'll go through the proxy once the connection migration stuff is in to start collecting metrics for items that we care about. I suspect there are some unused/not useful metrics at the moment as well. I moved the logging stuff into the connector though.

Copy link
Collaborator

@jeffswenson jeffswenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I like the direction you are taking the sqlproxy. proxy_handler was a bit much. Pulling logic out into the connector and forwarder is a big improvement.

pkg/ccl/sqlproxyccl/connector.go Outdated Show resolved Hide resolved
pkg/ccl/sqlproxyccl/connector.go Outdated Show resolved Hide resolved
pkg/ccl/sqlproxyccl/connector.go Outdated Show resolved Hide resolved
@jaylim-crl jaylim-crl force-pushed the jay/add-sqlproxy-connector branch 2 times, most recently from 1d12cee to 61308d1 Compare February 14, 2022 23:26
Copy link
Collaborator Author

@jaylim-crl jaylim-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @jeffswenson, and @rafiss)


pkg/ccl/sqlproxyccl/connector.go, line 197 at r3 (raw file):

Previously, JeffSwenson (Jeff Swenson) wrote…

nit: consider naming openClusterConnInternal "dialTenantCluster".

Done.


pkg/ccl/sqlproxyccl/connector.go, line 278 at r3 (raw file):
The latter would be tricky to test I think. I went with the first approach. Did not log here because we're already logging in the caller.

avoid panics in the data plane code

I have mixed feelings about this. Back then when I returned errors, people told me they preferred panics especially when a case is impossible; it's better to crash right away than letting it go as an error to the caller. I personally prefer the panic approach, but I've changed this anyway. If you grep for panic("unreachable"), you'll find some in the kv or sql layer.


pkg/ccl/sqlproxyccl/connector.go, line 334 at r3 (raw file):

Previously, JeffSwenson (Jeff Swenson) wrote…

nit: consider naming dialOutgoingAddr "dialSqlServer(serverAddr string)"

Done.

@jaylim-crl jaylim-crl force-pushed the jay/add-sqlproxy-connector branch 3 times, most recently from 861cab7 to a7b89ee Compare February 15, 2022 01:58
…val token

Informs cockroachdb#76000.

Previously, all the connection establishment logic is coupled with the handler
function within proxy_handler.go. This makes connecting to a new SQL pod during
connection migration difficult. This commit refactors all of those connection
logic out of the proxy handler into a connector component, as described in the
connection migration RFC. At the same time, we also add support for the session
revival token within this connector component.

Note that the overall behavior of the SQL proxy should be unchanged with this
commit.

Release note: None
@jaylim-crl
Copy link
Collaborator Author

bors r=JeffSwenson

craig bot pushed a commit that referenced this pull request Feb 15, 2022
73586: rfc: optimize the draining process with connection_wait and relevant reporting systems r=ZhouXing19 a=ZhouXing19

This is a proposal for optimizing the draining process to be more legible for
customers, and introduce a new step, connection_wait, to the draining process,
which allows the server to early exit when all connections are closed.

Release note: None

76410: kv: disallow GC requests that bump GC threshold and GC expired versions r=nvanbenschoten a=nvanbenschoten

Related to #55293.

This commit adds a safeguard to GC requests that prevents them from
bumping the GC threshold at the same time that they GC individual MVCC
versions. This was found to be unsafe in #55293 because performing both
of these actions at the same time could lead to a race where a read
request is allowed to evaluate without error while also failing to see
MVCC versions that are concurrently GCed.

This race is possible because foreground traffic consults the in-memory
version of the GC threshold (`r.mu.state.GCThreshold`), which is updated
after (in `handleGCThresholdResult`), not atomically with, the application
of the GC request's WriteBatch to the LSM (in `ApplyToStateMachine`). This
allows a read request to see the effect of a GC on MVCC state without seeing
its effect on the in-memory GC threshold.

The latches acquired by GC quests look like it will help with this race,
but in practice they do not for two reasons:
1. the latches do not protect timestamps below the GC request's batch
   timestamp. This means that they only conflict with concurrent writes,
   but not all concurrent reads.
2. the read could be served off a follower, which could be applying the
   GC request's effect from the raft log. Latches held on the leaseholder
   would have no impact on a follower read.

Thankfully, the GC queue has split these two steps for the past few
releases, at least since 87e85eb, so we do not have a bug today.

The commit also adds a test that reliably exercises the bug with a few
well-placed calls to `time.Sleep`. The test contains a variant where the
read is performed on the leaseholder and a variant where it is performed
on a follower. Both fail by default. If we switch the GC request to
acquire non-MVCC latches then the leaseholder variant passes, but the
follower read variant still fails.

76417: ccl/sqlproxyccl: add connector component and support for session revival token r=JeffSwenson a=jaylim-crl

Informs #76000.

Previously, all the connection establishment logic is coupled with the handler
function within proxy_handler.go. This makes connecting to a new SQL pod during
connection migration difficult. This commit refactors all of those connection
logic out of the proxy handler into a connector component, as described in the
connection migration RFC. At the same time, we also add support for the session
revival token within this connector component.

Note that the overall behavior of the SQL proxy should be unchanged with this
commit.

Release note: None

76545: cmd/reduce: add -tlp option r=yuzefovich a=yuzefovich

**cmd/reduce: remove stdin option and require -file argument**

We tend to not use the option of passing input SQL via stdin, so this
commit removes it. An additional argument in favor of doing that is that
the follow-up commit will introduce another mode of behavior that
requires `-file` argument to be specified, so it's just cleaner to
always require it now.

Release note: None

**cmd/reduce: add -tlp option**

This commit adds `-tlp` boolean flag that changes the behavior of
`reduce`. It is required that `-file` is specified whenever the `-tlp`
flag is used. The behavior is such that the last two queries (delimited
by empty lines) in the file contain unpartitioned and partitioned queries
that return different results although they are equivalent.

If TLP check is requested, then we remove the last two queries from the
input which we use then to construct a special TLP check query that
results in an error if two removed queries return different results.

We do not just include the TLP check query into the input string because
the reducer would then reduce the check query itself, making the
reduction meaningless.

Release note: None

76598: server: use channel for DisableAutomaticVersionUpgrade r=RaduBerinde a=RaduBerinde

DisableAutomaticVersionUpgrade is an atomic integer which is rechecked
in a retry loop. This is not a very clean mechanism, and can lead to
issues where you're unknowingly dealing with a copy of the knobs and
setting the wrong atomic. The retry loop can also add unnecessary
delays in tests.

This commit changes DisableAutomaticVersionUpgrade from an atomic
integer to a channel. If the channel is set, auto-upgrade waits until
the channel is closed.

Release note: None

Co-authored-by: Jane Xing <zhouxing@uchicago.edu>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Jay <jay@cockroachlabs.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Feb 15, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Feb 15, 2022

Build succeeded:

@craig craig bot merged commit 1452123 into cockroachdb:master Feb 15, 2022
@jaylim-crl jaylim-crl deleted the jay/add-sqlproxy-connector branch February 16, 2022 00:51
jaylim-crl added a commit to jaylim-crl/cockroach that referenced this pull request Feb 28, 2022
Informs cockroachdb#76000. Extracted from cockroachdb#76805.

Previously, we assumed that with the token-based authentication, the server is
ready to accept queries the moment we connect to it. This assumption has been
proved wrong during the integration tests with the forwarder, and there's
an implicit AuthenticateOK step that happens after connecting to the server.
During that time, initial connection data such as ParameterStatus and
BackendKeyData messages will be sent to the client as well. For now, we will
ignore those messages. Once we start implementing query cancellation within
the proxy, that has to be updated to cache the new BackendKeyData.

Release note: None

Release justification: This fixes a bug that was introduced when we added
token-based authentication support to the proxy in cockroachdb#76417. This is low risk,
as the code is guarded behind the connection migration feature, which is
currently not being used in production. To add on, CockroachCloud is the only
user of sqlproxy.
jaylim-crl added a commit to jaylim-crl/cockroach that referenced this pull request Feb 28, 2022
Informs cockroachdb#76000. Extracted from cockroachdb#76805.

Previously, we assumed that with the token-based authentication, the server is
ready to accept queries the moment we connect to it. This assumption has been
proved wrong during the integration tests with the forwarder, and there's
an implicit AuthenticateOK step that happens after connecting to the server.
During that time, initial connection data such as ParameterStatus and
BackendKeyData messages will be sent to the client as well. For now, we will
ignore those messages. Once we start implementing query cancellation within
the proxy, that has to be updated to cache the new BackendKeyData.

This commit also fixes a buglet to handle pgwire messages with no body.
pgproto3's Receive methods will still call Next if the body size is 0, and
previously, we were returning an io.EOF error. This commit changes that
behavior to return an empty slice.

Release note: None

Release justification: This fixes two buglets: one that was introduced when we
added token-based authentication support to the proxy in cockroachdb#76417, and another
when we added the interceptors. This is low risk as part of the code is
guarded behind the connection migration feature, which is currently not being
used in production. To add on, CockroachCloud is the only user of sqlproxy.
jaylim-crl added a commit to jaylim-crl/cockroach that referenced this pull request Mar 1, 2022
Informs cockroachdb#76000. Extracted from cockroachdb#76805.

Previously, we assumed that with the token-based authentication, the server is
ready to accept queries the moment we connect to it. This assumption has been
proved wrong during the integration tests with the forwarder, and there's
an implicit AuthenticateOK step that happens after connecting to the server.
During that time, initial connection data such as ParameterStatus and
BackendKeyData messages will be sent to the client as well. For now, we will
ignore those messages. Once we start implementing query cancellation within
the proxy, that has to be updated to cache the new BackendKeyData.

This commit also fixes a buglet to handle pgwire messages with no body.
pgproto3's Receive methods will still call Next if the body size is 0, and
previously, we were returning an io.EOF error. This commit changes that
behavior to return an empty slice.

Release note: None

Release justification: This fixes two buglets: one that was introduced when we
added token-based authentication support to the proxy in cockroachdb#76417, and another
when we added the interceptors. This is low risk as part of the code is
guarded behind the connection migration feature, which is currently not being
used in production. To add on, CockroachCloud is the only user of sqlproxy.
craig bot pushed a commit that referenced this pull request Mar 1, 2022
77111: ccl/sqlproxyccl: handle implicit auth in OpenTenantConnWithToken r=JeffSwenson a=jaylim-crl

Informs #76000. Extracted from #76805.

Previously, we assumed that with the token-based authentication, the server is
ready to accept queries the moment we connect to it. This assumption has been
proved wrong during the integration tests with the forwarder, and there's
an implicit AuthenticateOK step that happens after connecting to the server.
During that time, initial connection data such as ParameterStatus and
BackendKeyData messages will be sent to the client as well. For now, we will
ignore those messages. Once we start implementing query cancellation within
the proxy, that has to be updated to cache the new BackendKeyData.

This commit also fixes a buglet to handle pgwire messages with no body.
pgproto3's Receive methods will still call Next if the body size is 0, and
previously, we were returning an io.EOF error. This commit changes that
behavior to return an empty slice.

Release note: None

Release justification: This fixes two buglets: one that was introduced when we
added token-based authentication support to the proxy in #76417, and another
when we added the interceptors. This is low risk as part of the code is
guarded behind the connection migration feature, which is currently not being
used in production. To add on, CockroachCloud is the only user of sqlproxy.

77176: logictest: fix flakey alter_table r=ajwerner a=ajwerner

The GC job for the temp index runs immediately so the fraction might be 1.

Fixes #76911.

Release justification: non-production code change

Release note: None

Co-authored-by: Jay <jay@cockroachlabs.com>
Co-authored-by: Andrew Werner <awerner32@gmail.com>
RajivTS pushed a commit to RajivTS/cockroach that referenced this pull request Mar 6, 2022
Informs cockroachdb#76000. Extracted from cockroachdb#76805.

Previously, we assumed that with the token-based authentication, the server is
ready to accept queries the moment we connect to it. This assumption has been
proved wrong during the integration tests with the forwarder, and there's
an implicit AuthenticateOK step that happens after connecting to the server.
During that time, initial connection data such as ParameterStatus and
BackendKeyData messages will be sent to the client as well. For now, we will
ignore those messages. Once we start implementing query cancellation within
the proxy, that has to be updated to cache the new BackendKeyData.

This commit also fixes a buglet to handle pgwire messages with no body.
pgproto3's Receive methods will still call Next if the body size is 0, and
previously, we were returning an io.EOF error. This commit changes that
behavior to return an empty slice.

Release note: None

Release justification: This fixes two buglets: one that was introduced when we
added token-based authentication support to the proxy in cockroachdb#76417, and another
when we added the interceptors. This is low risk as part of the code is
guarded behind the connection migration feature, which is currently not being
used in production. To add on, CockroachCloud is the only user of sqlproxy.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants