rfcs: sqlproxy connection migration #75707

jaylim-crl · 2022-01-29T17:47:26Z

This proposal describes the connection migration mechanism within the SQL
Proxy that allows us to migrate SQL connections from one SQL pod to another
by building on top of the session migration work by the SQL Experience team.

This RFC is only scoped to how to perform the migration from one SQL pod to
another within the SQL proxy. The details of when to perform the migration
will be discussed in a follow-up RFC. Note that the main use-case here is for
CockroachDB Serverless, and we should optimize for that.

What this will eventually solve:

rude disconnects during Serverless scale-down events
load imbalance during steady state and Serverless scale-up events

From the user's perspective, the connection should remain usable without
interruption, and there will be better resource utilization across all
provisioned SQL pods.

This RFC has already received some reviews in an earlier form of an internal
Google Doc, Proxy Connection Migration.

Release justification: RFC/docs only change.

Release note: None

cockroach-teamcity · 2022-01-29T17:47:32Z

This change is

petermattis

Nicely written. Only did a quick skim in order to leave a few comments about some relevant concerns about related work.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @jaylim-crl, @jeffswenson, and @rafiss)

docs/RFCS/20220129_sqlproxy_connection_migration.md, line 29 at r1 (raw file):

draining phase lasts for at most 10 minutes, and if connections are still active
at the end of the phase, they will be terminated abruptly, which can lead to
poor user experiences.

In addition to migration due to scaling events, we may want to migrate connections when performing a rolling upgrade of the CRDB version the pods are running. This use case also applies to Dedicated clusters and I can imagine a day in the future where we deploy SQL proxy in front of Dedicated clusters in order to get hit-less rolling upgrades. I'm not arguing for supporting Dedicated clusters in this work, but it would be nice if anything done here doesn't preclude that eventuality.

docs/RFCS/20220129_sqlproxy_connection_migration.md, line 54 at r1 (raw file):

For every client connection, the proxy selects a backend based on some routing
algorithm, performs authentication with it, and does a [transparent forwarding](https://github.com/cockroachdb/cockroach/blob/ced473fa0ed00660812a6f6678155fb30e6a871f/pkg/ccl/sqlproxyccl/proxy.go#L84-L86)
of all packets from the client to the server without intercepting messages after

Nit: the messages are still intercepted, right? That is, we're still reading from one socket and writing to another socket. It is simply that we're not decoding or altering the messages after the authentication phase.

docs/RFCS/20220129_sqlproxy_connection_migration.md, line 105 at r1 (raw file):

is aware of Postgres message boundaries.

We propose to make the proxy aware of pgwire message boundaries. This implies

Coincidentally, we likely need to make the proxy aware of pgwire message contents in order to support the pgwire cancel protocol.

docs/RFCS/20220129_sqlproxy_connection_migration.md, line 378 at r1 (raw file):

instances. These instances come with an additional decoding overhead for every
message. For the proxy, there is no need to decode the messages in a steady
state.

This might need to change to support the pgwire cancel protocol. In particular, I think we'd need to decode BackendKeyData messages sent by the server in order to alter them before sending them to the client. It isn't clear yet exactly how this is going to work, but worth staying abreast of the cancel protocol requirements. Cc @rafiss who is leading the charge on that work.

jaylim-crl

TFTRs! For some reason I'm not seeing the comments on Reviewable, so I quoted again.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @jeffswenson, and @rafiss)

docs/RFCS/20220129_sqlproxy_connection_migration.md, line 29 at r1 (raw file):
From Peter:

In addition to migration due to scaling events, we may want to migrate connections when performing a rolling upgrade of the CRDB version the pods are running. This use case also applies to Dedicated clusters and I can imagine a day in the future where we deploy SQL proxy in front of Dedicated clusters in order to get hit-less rolling upgrades. I'm not arguing for supporting Dedicated clusters in this work, but it would be nice if anything done here doesn't preclude that eventuality.

This is an interesting use-case, and I hadn't thought about that. Thanks for bringing this up. Thinking about it more, we could even use this for tenant upgrades within Serverless. Today, we just terminate old pods during upgrades, and spin up new ones, resulting in rude disconnects. There's an issue filed to make the process less disruptive: https://cockroachlabs.atlassian.net/browse/CC-4796. cc: @alyshanjahani-crl

This RFC proposes a foundation to how we will transfer connections from one pod to another, and will support the use-case that was described. The interesting bit that needs to be updated when that happens would be the pod selection algorithm. When transferring connections during upgrades, a caller could invoke the relevant APIs on the forwarder (e.g. RequestTransfer) to transfer a connection, and the pod selection algorithm would need to return a pod that is ready to serve connections (e.g. newer pods which are ready, or older pods which haven't been drained yet).

docs/RFCS/20220129_sqlproxy_connection_migration.md, line 55 at r1 (raw file):
From Peter:

Nit: the messages are still intercepted, right? That is, we're still reading from one socket and writing to another socket. It is simply that we're not decoding or altering the messages after the authentication phase.

Right, we're still reading from one, and writing over to the other. My orignal intention was to emphasize that bytes are just forwarded in a transparent manner, without any notion of pgwire message boundaries. I'll rework this.

docs/RFCS/20220129_sqlproxy_connection_migration.md, line 105 at r1 (raw file):
From Peter:

Coincidentally, we likely need to make the proxy aware of pgwire message contents in order to support the pgwire cancel protocol.

Ack - that's good to know.

docs/RFCS/20220129_sqlproxy_connection_migration.md, line 378 at r1 (raw file):
From Peter:

This might need to change to support the pgwire cancel protocol. In particular, I think we'd need to decode BackendKeyData messages sent by the server in order to alter them before sending them to the client. It isn't clear yet exactly how this is going to work, but worth staying abreast of the cancel protocol requirements. Cc @rafiss who is leading the charge on that work.

That's fine, and the proposed design in this RFC can do that. We could peek on the interceptor to look for BackendKeyData messages during the steady state, and decode accordingly in the response processor. I'm not entirely sure how the cancel operation will be requested (e.g. user-initiated, server-initiated), but if we didn't want to decode all BackendKeyData messages during the steady state, we could have an additional signal (e.g. check on SimpleQuery in the case of user-initiated cancel operations) to start the decoding window. Regardless, with decoding, there will be some performance hit as there might be allocations and copying, depending on how large these messages are.

petermattis

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @jaylim-crl, @jeffswenson, and @rafiss)

docs/RFCS/20220129_sqlproxy_connection_migration.md, line 378 at r1 (raw file):

Previously, jaylim-crl (Jay Lim) wrote…

From Peter:

This might need to change to support the pgwire cancel protocol. In particular, I think we'd need to decode BackendKeyData messages sent by the server in order to alter them before sending them to the client. It isn't clear yet exactly how this is going to work, but worth staying abreast of the cancel protocol requirements. Cc @rafiss who is leading the charge on that work.

That's fine, and the proposed design in this RFC can do that. We could peek on the interceptor to look for BackendKeyData messages during the steady state, and decode accordingly in the response processor. I'm not entirely sure how the cancel operation will be requested (e.g. user-initiated, server-initiated), but if we didn't want to decode all BackendKeyData messages during the steady state, we could have an additional signal (e.g. check on SimpleQuery in the case of user-initiated cancel operations) to start the decoding window. Regardless, with decoding, there will be some performance hit as there might be allocations and copying, depending on how large these messages are.

The BackendKeyData messages are very small. 12 or 16 bytes. I think we'd need to decode all of them. While cancellation is user-initiated, the CancelMessage from the user contains info in the BackendKeyData messages.

jaylim-crl

TFTR! Interesting how the comments appeared (see timestamps):

I marked some of them as done to avoid confusion.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @jeffswenson, @petermattis, and @rafiss)

docs/RFCS/20220129_sqlproxy_connection_migration.md, line 29 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

In addition to migration due to scaling events, we may want to migrate connections when performing a rolling upgrade of the CRDB version the pods are running. This use case also applies to Dedicated clusters and I can imagine a day in the future where we deploy SQL proxy in front of Dedicated clusters in order to get hit-less rolling upgrades. I'm not arguing for supporting Dedicated clusters in this work, but it would be nice if anything done here doesn't preclude that eventuality.

Done.

docs/RFCS/20220129_sqlproxy_connection_migration.md, line 105 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Coincidentally, we likely need to make the proxy aware of pgwire message contents in order to support the pgwire cancel protocol.

Done.

docs/RFCS/20220129_sqlproxy_connection_migration.md, line 378 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

The BackendKeyData messages are very small. 12 or 16 bytes. I think we'd need to decode all of them. While cancellation is user-initiated, the CancelMessage from the user contains info in the BackendKeyData messages.

Ack. Since the messages are small, I wouldn't be too worried about this since it's likely that the memory will come from the internal buffer without any additional allocations (at least for most of the time).

petermattis

TFTR! Interesting how the comments appeared (see timestamps):

Looks like some sort of reviewable bug.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @jeffswenson, and @rafiss)

jeffswenson

LGTM. I do have a couple production concerns. You can add them to this RFC or open a Jira ticket.

There should be metrics tracking the number of failed transfers and the number of connections closed mid-transfer. The metrics should be visible on the sqlproxy grafana page.
We need a way to disable this for a specific tenant. That way if there is a workload that really tickles the known edge cases in the protocol, we can opt the tenant out of session transfers. I'm imaging something like a flag on the sqlproxy or an option returned by the tenant directory.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @alyshanjahani-crl, @andy-kimball, @jaylim-crl, and @rafiss)

docs/RFCS/20220129_sqlproxy_connection_migration.md, line 29 at r1 (raw file):

Previously, jaylim-crl (Jay Lim) wrote…

From Peter:

In addition to migration due to scaling events, we may want to migrate connections when performing a rolling upgrade of the CRDB version the pods are running. This use case also applies to Dedicated clusters and I can imagine a day in the future where we deploy SQL proxy in front of Dedicated clusters in order to get hit-less rolling upgrades. I'm not arguing for supporting Dedicated clusters in this work, but it would be nice if anything done here doesn't preclude that eventuality.

This is an interesting use-case, and I hadn't thought about that. Thanks for bringing this up. Thinking about it more, we could even use this for tenant upgrades within Serverless. Today, we just terminate old pods during upgrades, and spin up new ones, resulting in rude disconnects. There's an issue filed to make the process less disruptive: https://cockroachlabs.atlassian.net/browse/CC-4796. cc: @alyshanjahani-crl

This RFC proposes a foundation to how we will transfer connections from one pod to another, and will support the use-case that was described. The interesting bit that needs to be updated when that happens would be the pod selection algorithm. When transferring connections during upgrades, a caller could invoke the relevant APIs on the forwarder (e.g. RequestTransfer) to transfer a connection, and the pod selection algorithm would need to return a pod that is ready to serve connections (e.g. newer pods which are ready, or older pods which haven't been drained yet).

The big asterisk for migrating sessions during the upgrade process is version skew. If we want to transfer between versions, we need the DB to guarantee the serialized session is compatible.

docs/RFCS/20220129_sqlproxy_connection_migration.md, line 227 at r1 (raw file):

coupled with the [proxy's handler](https://github.com/cockroachdb/cockroach/blob/ad27839166526a92444bf0f719cf9ec5f62c2f7f/pkg/ccl/sqlproxyccl/proxy_handler.go#L209).
At the same time, the [authenticate](https://github.com/cockroachdb/cockroach/blob/ad27839166526a92444bf0f719cf9ec5f62c2f7f/pkg/ccl/sqlproxyccl/authentication.go#L21)
function that is used for authentication is coupled with the throttler. We

The sqlproxy does not need to check the throttler when it is transferring a connection between sql pods. The throttle is intended to prevent credential guesses and the proxy only transfers authenticated connections. The design you discuss handles this case well. The sqlproxy can inject no-op throttle hooks when transferring sessions.

knz

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @alyshanjahani-crl, @andy-kimball, @jaylim-crl, and @rafiss)

docs/RFCS/20220129_sqlproxy_connection_migration.md, line 29 at r1 (raw file):

Previously, JeffSwenson (Jeff Swenson) wrote…

The big asterisk for migrating sessions during the upgrade process is version skew. If we want to transfer between versions, we need the DB to guarantee the serialized session is compatible.

Currently CockroachDB changes query planning rules, adds new SQL session variables, and makes other user-visible changes automatically during upgrades.
This is not compatible with seamless session migration across versions: a SQL client could auto-detect a version upon connecting, make choices/assumptions about SQL semantics over that connection, and if we migrate to a new version these assumptions would break with inscrutable errors client-side.

At the very least, we should never transfer a session across an upgrade to a different major version of cockroachdb.

Then I am tempted to have us make the choice to transfer sessions across versions configurable for the end-user, so that they can disable this if it's incompatible with their client app. Something like a choice between the following three:

enable seamless migrations, warning: this can cause your apps to experience different versions of CockroachDB within the same SQL connection.
disable seamless migrations, block upgrades with open connections, warning: this can cause your apps to block cockroachdb bug fixes.
disable seamless migrations, close connections upon upgrades, warning: this can cause your apps to experience a server-side disconnect upon upgrades.

jaylim-crl

TFTR! Updated this slightly to reflect the implementation that was merged.

There should be metrics tracking the number of failed transfers and the number of connections closed mid-transfer. The metrics should be visible on the sqlproxy grafana page.

I added this to the meta tracking issue here: #76000

We need a way to disable this for a specific tenant. That way if there is a workload that really tickles the known edge cases in the protocol, we can opt the tenant out of session transfers. I'm imaging something like a flag on the sqlproxy or an option returned by the tenant directory.

That's going to be part of the next RFC. This RFC is only scoped to how to transfer. The mechanism of when to transfer will be handled by the rebalancer, and for that, we could have it check on the tenant metadata to know whether connections can be transferred.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @alyshanjahani-crl, @andy-kimball, @jeffswenson, and @rafiss)

docs/RFCS/20220129_sqlproxy_connection_migration.md, line 29 at r1 (raw file):

Previously, knz (kena) wrote…

Currently CockroachDB changes query planning rules, adds new SQL session variables, and makes other user-visible changes automatically during upgrades.
This is not compatible with seamless session migration across versions: a SQL client could auto-detect a version upon connecting, make choices/assumptions about SQL semantics over that connection, and if we migrate to a new version these assumptions would break with inscrutable errors client-side.

At the very least, we should never transfer a session across an upgrade to a different major version of cockroachdb.

Then I am tempted to have us make the choice to transfer sessions across versions configurable for the end-user, so that they can disable this if it's incompatible with their client app. Something like a choice between the following three:

enable seamless migrations, warning: this can cause your apps to experience different versions of CockroachDB within the same SQL connection.

disable seamless migrations, block upgrades with open connections, warning: this can cause your apps to block cockroachdb bug fixes.

disable seamless migrations, close connections upon upgrades, warning: this can cause your apps to experience a server-side disconnect upon upgrades.

Those are all valid points. We will consider that in the upcoming RFC about when to transfer a connection. In CC today, tenant upgrades destroy the old SQL pods right away without putting them into the DRAINING state, causing connections to be terminated right away, so migrating across version upgrades is not possible. That said, if we wanted to be defensive to start, we could ensure that we never transfer connections whenever versions do not match.

docs/RFCS/20220129_sqlproxy_connection_migration.md, line 227 at r1 (raw file):

Previously, JeffSwenson (Jeff Swenson) wrote…

The sqlproxy does not need to check the throttler when it is transferring a connection between sql pods. The throttle is intended to prevent credential guesses and the proxy only transfers authenticated connections. The design you discuss handles this case well. The sqlproxy can inject no-op throttle hooks when transferring sessions.

The current design only throttles whenever we connect using the regular auth method. If there's a need in the future, we could also throttle the token approach (assuming that it becomes an end-user feature). I removed the part of decoupling throttler and authenticator since that's already implicit with the code today (i.e. authenticate takes in a throttleHook, though we still import the throttler).

jaylim-crl

Everything has been merged, and this feature is now completed. The RFC has been updated to reflect what we have merged. I'll merge this next week if no one else has concerns.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @alyshanjahani-crl, @andy-kimball, @jeffswenson, and @rafiss)

This proposal describes the connection migration mechanism within the SQL Proxy that allows us to migrate SQL connections from one SQL pod to another by building on top of the session migration work by the SQL Experience team. This RFC is only scoped to _how_ to perform the migration from one SQL pod to another within the SQL proxy. The details of _when_ to perform the migration will be discussed in a follow-up RFC. Note that the main use-case here is for CockroachDB Serverless, and we should optimize for that. What this will eventually solve: - rude disconnects during Serverless scale-down events - load imbalance during steady state and Serverless scale-up events From the user's perspective, the connection should remain usable without interruption, and there will be better resource utilization across all provisioned SQL pods. Release justification: RFC/docs only change. Release note: None

jaylim-crl · 2022-03-19T22:12:05Z

bors r=JeffSwenson

If there are more concerns, I'd be happy to address in another PR.

craig · 2022-03-19T23:03:46Z

Build succeeded:

GitHub CI (Cockroach)

jaylim-crl force-pushed the jay/rfc-connection-migration branch 2 times, most recently from 5a20549 to c02ceb3 Compare January 29, 2022 17:50

jaylim-crl changed the title ~~rfc: sqlproxy connection migration~~ rfcs: sqlproxy connection migration Jan 29, 2022

jaylim-crl force-pushed the jay/rfc-connection-migration branch 2 times, most recently from 5c1a226 to 77db251 Compare January 29, 2022 17:58

jaylim-crl requested review from rafiss, jeffswenson and andy-kimball January 29, 2022 18:00

jaylim-crl marked this pull request as ready for review January 29, 2022 18:00

jaylim-crl requested a review from a team as a code owner January 29, 2022 18:00

petermattis reviewed Jan 29, 2022

View reviewed changes

jaylim-crl commented Jan 30, 2022

View reviewed changes

petermattis reviewed Jan 31, 2022

View reviewed changes

jaylim-crl commented Jan 31, 2022

View reviewed changes

petermattis reviewed Jan 31, 2022

View reviewed changes

jaylim-crl mentioned this pull request Feb 4, 2022

sql, ccl/sqlproxyccl: connection migration tracking issue #76000

Closed

14 tasks

jeffswenson approved these changes Feb 4, 2022

View reviewed changes

knz reviewed Feb 5, 2022

View reviewed changes

jaylim-crl force-pushed the jay/rfc-connection-migration branch from 77db251 to 76b8559 Compare February 19, 2022 17:00

jaylim-crl requested review from alyshanjahani-crl and jeffswenson February 19, 2022 17:11

jaylim-crl commented Feb 19, 2022

View reviewed changes

jaylim-crl force-pushed the jay/rfc-connection-migration branch from 76b8559 to edb29eb Compare March 13, 2022 19:46

jaylim-crl commented Mar 13, 2022

View reviewed changes

jaylim-crl force-pushed the jay/rfc-connection-migration branch from edb29eb to 8ce1d8a Compare March 13, 2022 19:54

craig bot merged commit b3525c9 into cockroachdb:master Mar 19, 2022

jaylim-crl deleted the jay/rfc-connection-migration branch March 20, 2022 00:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rfcs: sqlproxy connection migration #75707

rfcs: sqlproxy connection migration #75707

jaylim-crl commented Jan 29, 2022 •

edited

Loading

cockroach-teamcity commented Jan 29, 2022

petermattis left a comment

jaylim-crl left a comment

petermattis left a comment

jaylim-crl left a comment

petermattis left a comment

jeffswenson left a comment

knz left a comment

jaylim-crl left a comment

jaylim-crl left a comment

jaylim-crl commented Mar 19, 2022

craig bot commented Mar 19, 2022

rfcs: sqlproxy connection migration #75707

rfcs: sqlproxy connection migration #75707

Conversation

jaylim-crl commented Jan 29, 2022 • edited Loading

cockroach-teamcity commented Jan 29, 2022

petermattis left a comment

Choose a reason for hiding this comment

jaylim-crl left a comment

Choose a reason for hiding this comment

petermattis left a comment

Choose a reason for hiding this comment

jaylim-crl left a comment

Choose a reason for hiding this comment

petermattis left a comment

Choose a reason for hiding this comment

jeffswenson left a comment

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment

jaylim-crl left a comment

Choose a reason for hiding this comment

jaylim-crl left a comment

Choose a reason for hiding this comment

jaylim-crl commented Mar 19, 2022

craig bot commented Mar 19, 2022

jaylim-crl commented Jan 29, 2022 •

edited

Loading