Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: add partial support for pgwire row count limits #39085

Merged
merged 1 commit into from
Jul 31, 2019
Merged

sql: add partial support for pgwire row count limits #39085

merged 1 commit into from
Jul 31, 2019

Conversation

maddyblue
Copy link
Contributor

@maddyblue maddyblue commented Jul 24, 2019

Previously we supported row count limits as long as the limit was higher
than the number of returned rows. Here we add a feature that supports
this part of the spec in a partial way. This should unblock simple JDBC
usage of this feature.

Supported use cases:

  • implicit transactions (which auto close the portal after suspension)
  • explicit transactions executed to completion

Unsupported use cases (with explicit transactions):

  • interleaved execution
  • explicitly closing a portal after partial execution

Many options were evaluated during implementation. The one here is based
on work where the pgwire package itself takes over the state machine
processing during AddRow and looks for further ExecPortal messages. This
has a number of problems: there are now two state machines, we can only
support a part of the spec. However it also has a number of benefits
like it is a simple implementation that is easy to understand.

Two other solutions were evaluated.

First, teaching distsql how to pause and resume execution (a
proof-of-concept branch for this was produced). I did not pursue this
route because of my own unfamiliarity with distsql, and I thought that
attempting to reach a high level of confidence that all of the new pause,
resume, and error logic flows were correct would be very difficult
(I may be wrong about this). Also, this approach needed to address how
to handle post-distsql execution cleanup and stats gathering. That is,
after the call into distsql was paused and returned control back to
the sql executor, there's a lot of code that gets run to cleanup stuff
and report stats in various places, including some defers. These would
need to be audited and only execute once per statement, not once per
portal execution.

Second, start distsql execution in a new go routine and send exec portal
requests to it over a channel. This would avoid teaching distsql how
to pause and resume itself, but instead move that complexity into the
channel and go routine handling, another area ripe for race conditions
and deadlocks. This also needed to deal with the post-distsql execution
cleanup and defers handling discussed above.

For now, we decided that this limited implementation is good enough for
what we need today. In order to one day either support the full spec
or pay down some of our technical debt, we can probably do a number of
preliminary refactors that will make invoking much of the distsql path
multpile times easier.

pgx got a version bump to get support for PortalSuspended. The current
pgx version had a few new dependencies too.

See #4035

Release note (sql change): add partial support for row limits during
portal execution in pgwire.

@maddyblue maddyblue requested review from jordanlewis, andreimatei and a team July 24, 2019 20:09
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woo. Can you add a note on what vendored deps had to get upgraded?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @jordanlewis, and @mjibson)


pkg/sql/distsql_running.go, line 715 at r1 (raw file):

// LimitedResultErr is an error produced by pgwire indicating an unsupported
// feature of row count limits was attempted.
var LimitedResultErr = unimplemented.NewWithIssue(4035, "execute row count limits not supported")

I think we should make this error more specific now that we have partial support for the feature.


pkg/sql/pgwire/command_result.go, line 345 at r1 (raw file):

// forward. The work included is things like auditing all of the defers and
// post-execution stuff (like stats collection) to have it only execute once
// per statement instead of once per portal.

This is great. It would be nice to make sure we don't lose some other parts of the context we gathered during this exercise as well. Did you have more context in that Google doc you were working on? If so, could you move the Google doc that you wrote with the different pros and cons into Confluence and link that from here? You can move it into the public Confluence space (CRDB) so that this link is followable by non-employees.


pkg/sql/pgwire/command_result.go, line 421 at r1 (raw file):

		default:
			// We got some other message, but we only support executing to completion.
			return errors.WithDetail(sql.LimitedResultErr, "portals must be executed to completion")

What message is sent when a client explicitly closes their result set early? I would like to make sure we support that use case too, if it's possible, because I think it is likely commonly used.


pkg/sql/pgwire/testdata/pgtest/portals_crbugs, line 110 at r1 (raw file):

ReadyForQuery
ErrorResponse
ReadyForQuery

Is there any way we can assert the error message here as well?

Copy link
Contributor Author

@maddyblue maddyblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @jordanlewis)


pkg/sql/distsql_running.go, line 715 at r1 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

I think we should make this error more specific now that we have partial support for the feature.

Done.


pkg/sql/pgwire/command_result.go, line 345 at r1 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

This is great. It would be nice to make sure we don't lose some other parts of the context we gathered during this exercise as well. Did you have more context in that Google doc you were working on? If so, could you move the Google doc that you wrote with the different pros and cons into Confluence and link that from here? You can move it into the public Confluence space (CRDB) so that this link is followable by non-employees.

I intended the commit message here to be that. Do you think the doc I wrote has some more useful color?


pkg/sql/pgwire/command_result.go, line 421 at r1 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

What message is sent when a client explicitly closes their result set early? I would like to make sure we support that use case too, if it's possible, because I think it is likely commonly used.

Portal suspension followed by an explicit close is what happens when a result set is closed early (as verified by wireshark with postgres). We don't support this. That's because the close message needs to do some other stuff that the sql state machine handles. We could support this maybe, but it'd make the current hack even worse. I'd prefer to only do this if we really really have to.


pkg/sql/pgwire/testdata/pgtest/portals_crbugs, line 110 at r1 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

Is there any way we can assert the error message here as well?

Done.

@maddyblue
Copy link
Contributor Author

RFAL. Added support for implicit transactions and added some tests for suspension then close, which we don't support.

Copy link
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: let's get this in and start testing it more!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei and @mjibson)


pkg/sql/pgwire/command_result.go, line 345 at r1 (raw file):

Previously, mjibson (Matt Jibson) wrote…

I intended the commit message here to be that. Do you think the doc I wrote has some more useful color?

Ah, I didn't notice the commit message. Looks good.


pkg/sql/pgwire/command_result.go, line 421 at r1 (raw file):

Previously, mjibson (Matt Jibson) wrote…

Portal suspension followed by an explicit close is what happens when a result set is closed early (as verified by wireshark with postgres). We don't support this. That's because the close message needs to do some other stuff that the sql state machine handles. We could support this maybe, but it'd make the current hack even worse. I'd prefer to only do this if we really really have to.

Gotcha. I think we probably need to do this, but we can double check with some acceptance testing after this PR merges.

@maddyblue
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jul 30, 2019

Merge conflict (retrying...)

@craig
Copy link
Contributor

craig bot commented Jul 30, 2019

Merge conflict

Previously we supported row count limits as long as the limit was higher
than the number of returned rows. Here we add a feature that supports
this part of the spec in a partial way. This should unblock simple JDBC
usage of this feature.

Supported use cases:
- implicit transactions (which auto close the portal after suspension)
- explicit transactions executed to completion

Unsupported use cases (with explicit transactions):
- interleaved execution
- explicitly closing a portal after partial execution

Many options were evaluated during implementation. The one here is based
on work where the pgwire package itself takes over the state machine
processing during AddRow and looks for further ExecPortal messages. This
has a number of problems: there are now two state machines, we can only
support a part of the spec. However it also has a number of benefits
like it is a simple implementation that is easy to understand.

Two other solutions were evaluated.

First, teaching distsql how to pause and resume execution (a
proof-of-concept branch for this was produced). I did not pursue this
route because of my own unfamiliarity with distsql, and I thought that
attempting to reach a high level of confidence that all of the new pause,
resume, and error logic flows were correct would be very difficult
(I may be wrong about this). Also, this approach needed to address how
to handle post-distsql execution cleanup and stats gathering. That is,
after the call into distsql was paused and returned control back to
the sql executor, there's a lot of code that gets run to cleanup stuff
and report stats in various places, including some defers. These would
need to be audited and only execute once per statement, not once per
portal execution.

Second, start distsql execution in a new go routine and send exec portal
requests to it over a channel. This would avoid teaching distsql how
to pause and resume itself, but instead move that complexity into the
channel and go routine handling, another area ripe for race conditions
and deadlocks. This also needed to deal with the post-distsql execution
cleanup and defers handling discussed above.

For now, we decided that this limited implementation is good enough for
what we need today. In order to one day either support the full spec
or pay down some of our technical debt, we can probably do a number of
preliminary refactors that will make invoking much of the distsql path
multpile times easier.

pgx got a version bump to get support for PortalSuspended. The current
pgx version had a few new dependencies too.

See #4035

Release note (sql change): add partial support for row limits during
portal execution in pgwire.
@maddyblue
Copy link
Contributor Author

bors r+

craig bot pushed a commit that referenced this pull request Jul 31, 2019
39085: sql: add partial support for pgwire row count limits r=mjibson a=mjibson

Previously we supported row count limits as long as the limit was higher
than the number of returned rows. Here we add a feature that supports
this part of the spec in a partial way. This should unblock simple JDBC
usage of this feature.

Supported use cases:
- implicit transactions (which auto close the portal after suspension)
- explicit transactions executed to completion

Unsupported use cases (with explicit transactions):
- interleaved execution
- explicitly closing a portal after partial execution

Many options were evaluated during implementation. The one here is based
on work where the pgwire package itself takes over the state machine
processing during AddRow and looks for further ExecPortal messages. This
has a number of problems: there are now two state machines, we can only
support a part of the spec. However it also has a number of benefits
like it is a simple implementation that is easy to understand.

Two other solutions were evaluated.

First, teaching distsql how to pause and resume execution (a
proof-of-concept branch for this was produced). I did not pursue this
route because of my own unfamiliarity with distsql, and I thought that
attempting to reach a high level of confidence that all of the new pause,
resume, and error logic flows were correct would be very difficult
(I may be wrong about this). Also, this approach needed to address how
to handle post-distsql execution cleanup and stats gathering. That is,
after the call into distsql was paused and returned control back to
the sql executor, there's a lot of code that gets run to cleanup stuff
and report stats in various places, including some defers. These would
need to be audited and only execute once per statement, not once per
portal execution.

Second, start distsql execution in a new go routine and send exec portal
requests to it over a channel. This would avoid teaching distsql how
to pause and resume itself, but instead move that complexity into the
channel and go routine handling, another area ripe for race conditions
and deadlocks. This also needed to deal with the post-distsql execution
cleanup and defers handling discussed above.

For now, we decided that this limited implementation is good enough for
what we need today. In order to one day either support the full spec
or pay down some of our technical debt, we can probably do a number of
preliminary refactors that will make invoking much of the distsql path
multpile times easier.

pgx got a version bump to get support for PortalSuspended. The current
pgx version had a few new dependencies too.

See #4035

Release note (sql change): add partial support for row limits during
portal execution in pgwire.

Co-authored-by: Matt Jibson <matt.jibson@gmail.com>
@craig
Copy link
Contributor

craig bot commented Jul 31, 2019

Build succeeded

@qinzl1
Copy link

qinzl1 commented Oct 29, 2019

Is the commercial version only supported? The community version is not

@maddyblue
Copy link
Contributor Author

maddyblue commented Oct 29, 2019