Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ccl/sqlproxyccl: add basic forwarder component #76167

Merged
merged 1 commit into from
Feb 16, 2022

Conversation

jaylim-crl
Copy link
Collaborator

Informs #76000.

This commit refactors the ConnectionCopy call in proxy_handler.go into a new
forwarder component, which was described in the connection migration RFC. At
the moment, this forwarder component does basic forwarding through
ConnectionCopy, just like before, so there should be no behavioral changes to
the proxy. This will serve as a building block for subsequent PRs.

Release note: None

@jaylim-crl jaylim-crl requested a review from a team as a code owner February 7, 2022 16:24
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@jaylim-crl jaylim-crl force-pushed the jay/add-basic-forwarder branch 2 times, most recently from f37df46 to 1a04851 Compare February 7, 2022 19:20
pkg/ccl/sqlproxyccl/forwarder.go Outdated Show resolved Hide resolved
pkg/ccl/sqlproxyccl/proxy_handler.go Show resolved Hide resolved
pkg/ccl/sqlproxyccl/forwarder.go Outdated Show resolved Hide resolved
pkg/ccl/sqlproxyccl/forwarder.go Outdated Show resolved Hide resolved
Copy link
Collaborator Author

@jaylim-crl jaylim-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @darinpp, and @jeffswenson)


pkg/ccl/sqlproxyccl/forwarder.go, line 40 at r1 (raw file):

Previously, JeffSwenson (Jeff Swenson) wrote…

nit:
rename conn->clientConn and crdbConn->sqlServerConn.

I can do that. Will update once addressed.


pkg/ccl/sqlproxyccl/forwarder.go, line 61 at r1 (raw file):

Previously, JeffSwenson (Jeff Swenson) wrote…

I like to avoid states that indicate partial construction like "started". Typically they can be avoided by redefining the constructor. E.x. instead of exposing newForwarder() and Run(), this could expose a Forward(...) (Forwarder, err) method that constructs the forwarder and contains Run's logic.

I think this would even allow us to get rid ef the err and panic in proxy_handler. Since Run only returns precodition errors and we can avoid them by making it part of the constructor.

I started with that initially, but I don't remember why I took this approach anymore. Regardless, I can do that now, and if we decide we need that back, we could move it again. Doing this also allows us to eliminate the started state since we can be sure that the forwarder is only started once. To sum it up, we'll end up with something like:

// Pass ownerships of conn and crdbConn to the forwarder.
f := Forward(ctx, conn, crdbConn)
defer f.Close()

// TODO here: When transferring connections, we'll keep track of all the
// forwarders associated with a given pod. We'll need to add f to it.

// Block until an error is received, or when the stopper starts quiescing,
// whichever that happens first.
select {
    ...
}

pkg/ccl/sqlproxyccl/forwarder.go, line 135 at r1 (raw file):

Previously, JeffSwenson (Jeff Swenson) wrote…

What do you think of replacing Close() with context cancellation? Everywhere we check for .closed, it could instead check ctx.Err(). That allows us to drop the mutex and reduces the amount of code related to clean up.

We will need some sort of mutex eventually with the RequestTransfer API on the forwarder since that can be called within multiple goroutines. (I'll defer that decision to a follow up PR.)

I do agree that there's no need for this right now, but I'm uncertain if we'll need it later on. Let me think about this further; maybe I'll just remove this for now.


pkg/ccl/sqlproxyccl/proxy_handler.go, line 447 at r1 (raw file):

Previously, JeffSwenson (Jeff Swenson) wrote…

What do you think about pushing all of the logic below authenticate into Forwarder? That allows us to merge the goroutine waiting on the errChan and the goroutine waiting on the context cancellation.

In my mind Forwarder is a representation of the connection's state machine. So it feels reasonable to make it responsible for handling the idle and error states.

I had this in mind earlier, but the issue here is that there's also errConnection. I don't want to push errConnection into the forwarder. This channel is currently used to receive errors from the idle monitor or deny list watcher. Do you have thoughts on merging all of the channels?

Copy link
Collaborator Author

@jaylim-crl jaylim-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is ready for another round of review, and I've addressed most concerns.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @darinpp, and @jeffswenson)


pkg/ccl/sqlproxyccl/forwarder.go, line 40 at r1 (raw file):

Previously, jaylim-crl (Jay Lim) wrote…

I can do that. Will update once addressed.

Done.


pkg/ccl/sqlproxyccl/forwarder.go, line 61 at r1 (raw file):

Previously, jaylim-crl (Jay Lim) wrote…

I started with that initially, but I don't remember why I took this approach anymore. Regardless, I can do that now, and if we decide we need that back, we could move it again. Doing this also allows us to eliminate the started state since we can be sure that the forwarder is only started once. To sum it up, we'll end up with something like:

// Pass ownerships of conn and crdbConn to the forwarder.
f := Forward(ctx, conn, crdbConn)
defer f.Close()

// TODO here: When transferring connections, we'll keep track of all the
// forwarders associated with a given pod. We'll need to add f to it.

// Block until an error is received, or when the stopper starts quiescing,
// whichever that happens first.
select {
    ...
}

Done. I got rid of the started state, and merged newForwarder and Run.


pkg/ccl/sqlproxyccl/forwarder.go, line 135 at r1 (raw file):

Previously, jaylim-crl (Jay Lim) wrote…

We will need some sort of mutex eventually with the RequestTransfer API on the forwarder since that can be called within multiple goroutines. (I'll defer that decision to a follow up PR.)

I do agree that there's no need for this right now, but I'm uncertain if we'll need it later on. Let me think about this further; maybe I'll just remove this for now.

Done. I left Close, but replaced closed with ctx.Err() as suggested. We may need to reintroduce the mutex in the future.

Copy link
Contributor

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @darinpp, @jaylim-crl, and @jeffswenson)


pkg/ccl/sqlproxyccl/forwarder.go, line 40 at r2 (raw file):

	// old connection will be closed.
	clientConn net.Conn // client <-> proxy
	serverConn net.Conn // proxy <-> client

NIT: Did you mean // proxy <-> server?


pkg/ccl/sqlproxyccl/forwarder.go, line 50 at r2 (raw file):

// from clientConn to serverConn. When this is called, it is expected that the
// caller passes ownerships of clientConn and serverConn to the forwarder, which
// implies that these connections may be closed by the forwarder if necessary,

How does this work? Typically it causes an error if Close is called multiple times on net.Conn implementations. If forward might close and callers might close as well, it could trigger spurious errors. In cases like this, I always try to make it crystal clear who's responsible - either the caller must always close, or the forwarder must always close.


pkg/ccl/sqlproxyccl/proxy_handler.go, line 447 at r1 (raw file):

Previously, jaylim-crl (Jay Lim) wrote…

I had this in mind earlier, but the issue here is that there's also errConnection. I don't want to push errConnection into the forwarder. This channel is currently used to receive errors from the idle monitor or deny list watcher. Do you have thoughts on merging all of the channels?

I think it's an interesting idea to unify things, but think we should do it in a separate PR if we do. It'd be nice to have more consistency in how we manage background goroutines, communicate errors, etc. One idea is to pass errgroup.Group into the various places (idle monitor, deny list watcher, forwarder, etc), so they all create background goroutines and communicate errors using that rather than forwarder.errChan, denylist.denied, etc. Then we could call Wait here.

Copy link
Collaborator Author

@jaylim-crl jaylim-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @darinpp, and @jeffswenson)


pkg/ccl/sqlproxyccl/forwarder.go, line 135 at r1 (raw file):

Previously, jaylim-crl (Jay Lim) wrote…

Done. I left Close, but replaced closed with ctx.Err() as suggested. We may need to reintroduce the mutex in the future.

I figured why I used closed originally. One of the biggest drawback with ctx.Err() is that it will return an error if the top-level context gets cancelled. This also means that if we start adding more cleanup work to Close(), IsClosed() will return true, but we'd still have to call Close() to do cleanup. Regardless, since I already updated this to check on ctx.Err(), I'll leave it as-is, but I've renamed IsClosed() to IsStopped(). I also added this comment here:

// Note that callers MUST call Close in all cases (regardless of IsStopped)
// since we only check on context cancellation there. There could be a
// possibility where the top-level context was cancelled, but the forwarder
// has not cleaned up.

pkg/ccl/sqlproxyccl/forwarder.go, line 40 at r2 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

NIT: Did you mean // proxy <-> server?

Done.


pkg/ccl/sqlproxyccl/forwarder.go, line 50 at r2 (raw file):

Typically it causes an error if Close is called multiple times on net.Conn implementations.

I was originally thinking for the caller to just attempt the Close, and ignore the error if it's already closed, e.g. defer func() { _ = conn.Close() }(). I've updated this to ensure that the forwarder always closes the serverConn since connection migration requires us to retrieve a new connection. As for clientConn, that would be the caller's responsibility to avoid unnecessary logic in the caller.

Note that we're already calling Close on clientConn in multiple places in the proxy today, and ignoring the returned error from it.


pkg/ccl/sqlproxyccl/proxy_handler.go, line 447 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

I think it's an interesting idea to unify things, but think we should do it in a separate PR if we do. It'd be nice to have more consistency in how we manage background goroutines, communicate errors, etc. One idea is to pass errgroup.Group into the various places (idle monitor, deny list watcher, forwarder, etc), so they all create background goroutines and communicate errors using that rather than forwarder.errChan, denylist.denied, etc. Then we could call Wait here.

Yes, the suggestion would work. Agreed that we should do it in a separate PR as an enhancement if we were to take that path.

Copy link
Collaborator

@jeffswenson jeffswenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @darinpp, and @jaylim-crl)


pkg/ccl/sqlproxyccl/forwarder.go, line 135 at r1 (raw file):

Previously, jaylim-crl (Jay Lim) wrote…

I figured why I used closed originally. One of the biggest drawback with ctx.Err() is that it will return an error if the top-level context gets cancelled. This also means that if we start adding more cleanup work to Close(), IsClosed() will return true, but we'd still have to call Close() to do cleanup. Regardless, since I already updated this to check on ctx.Err(), I'll leave it as-is, but I've renamed IsClosed() to IsStopped(). I also added this comment here:

// Note that callers MUST call Close in all cases (regardless of IsStopped)
// since we only check on context cancellation there. There could be a
// possibility where the top-level context was cancelled, but the forwarder
// has not cleaned up.

Can we remove the IsStopped function? Currently we are only using it in tests. Instead of having something check IsStopped(), the proxyHandler goroutine will call Close() if the ctx is cancelled or the forwarder raises an error.


pkg/ccl/sqlproxyccl/forwarder.go, line 74 at r3 (raw file):

	go func() {
		// Block until context is done.

nit: I think we should move the ctx.Done() wait and f.Close() call back into proxyHandler.handle. That allows us to remove this goroutine. It also allows us to ensure Close() is only closed once. Right now we may call it multiple times if the ctx is canceled and an err is reported to proxyHandler.


pkg/ccl/sqlproxyccl/proxy_handler.go, line 447 at r1 (raw file):

Previously, jaylim-crl (Jay Lim) wrote…

Yes, the suggestion would work. Agreed that we should do it in a separate PR as an enhancement if we were to take that path.

I'm okay with deferring this to a follow up PR.

Ideally we would have at most three goroutines per connection:

  1. Manages the connection's lifecycle and state machine.
  2. Manages client -> proxy forwarding.
  3. Manages proxy -> client forwarding.

After thinking about it more, the goroutine managing the lifecycle of the connection should be the goroutine running proxyHandler.handle.

Copy link
Collaborator Author

@jaylim-crl jaylim-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @darinpp, and @jeffswenson)


pkg/ccl/sqlproxyccl/forwarder.go, line 135 at r1 (raw file):

Instead of having something check IsStopped(), the proxyHandler goroutine will call Close() if the ctx is cancelled or the forwarder raises an error.

See other comment. I'd like to make sure that once the forwarder raises an error, everything within the forwarder is stopped. The forwarder may return an error from the goroutine that handles server to client forwarding, but we should make sure that the other goroutine that handles client to server forwarding is stopped as well. At some point, I should replace this with an errgroup, but I'd like to do all of these incrementally.

Also, relying on just the channel to know whether the forwarder has an error is not ideal at all. If the main goroutine reads the error, there is no other way to tell whether the forwarder is in a bad state.

That said, this is not used in this PR, so I'll remove this. However, I'll add it back in an upcoming PR. I need that for the request and response processors.


pkg/ccl/sqlproxyccl/forwarder.go, line 74 at r3 (raw file):
This is expected. There's a block of comment right below this:

		// Close the forwarder to clean up. This goroutine is temporarily here
		// because the only way to unblock io.Copy is to close one of the ends,
		// which will be done through closing the forwarder. Once we replace
		// io.Copy with the interceptors, we could use f.ctx directly, and no
		// longer need this goroutine.
		//
		// Note that if f.Close was called externally, this will result
		// in two f.Close calls in total, i.e. one externally, and one here
		// once the context gets cancelled. This is fine for now since we'll
		// be removing this soon anyway.

We'll remove this entirely once we convert ConnectionCopy to use the interceptors.

I think we should move the ctx.Done() wait and f.Close() call back into proxyHandler.handle.

Moving f.Close() back into the proxy handler does not seem right to me. I'd like to ensure that everything is handled within forwarder. If the handler does not close, we're corrupted. Consider the case where we fail for some reason when copying bytes back from the server to the client, causing the response processor to be terminated. If the handler does not close, bytes will still be copied within the request processor (from client to server). There could be a possibility where the server has finished processing the request, but a response hasn't been propagated back to the client.

The original intention here is to have some mechanism for the forwarder to stop all of its activities once there's an error in errChan. I left this here. Happy to discuss other suggestions if you do have them.


pkg/ccl/sqlproxyccl/proxy_handler.go, line 447 at r1 (raw file):
Added a TODO here.

The two goroutines per connection as described in the RFC would be the request processor (for client to server), and response processor (for server to client). Transfer state will be handled within those goroutines.

After thinking about it more, the goroutine managing the lifecycle of the connection should be the goroutine running proxyHandler.handle.

Not sure what you meant by lifecycle of the connection here. The main goroutine handling the connection will be blocked waiting for other goroutines to complete.

Informs cockroachdb#76000.

This commit refactors the ConnectionCopy call in proxy_handler.go into a new
forwarder component, which was described in the connection migration RFC. At
the moment, this forwarder component does basic forwarding through
ConnectionCopy, just like before, so there should be no behavioral changes to
the proxy. This will serve as a building block for subsequent PRs.

Release note: None
@jaylim-crl
Copy link
Collaborator Author

bors r=JeffSwenson

@craig
Copy link
Contributor

craig bot commented Feb 16, 2022

Build succeeded:

@craig craig bot merged commit 512320f into cockroachdb:master Feb 16, 2022
@jaylim-crl jaylim-crl deleted the jay/add-basic-forwarder branch February 16, 2022 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants