sql: make internal executor streaming #59330

yuzefovich · 2021-01-23T00:24:15Z

This commit updates the internal executor to operate in a streaming
fashion by refactoring its internal logic to implement an iterator
pattern. A new method QueryInternalEx (and its counterpart
QueryInternal) is introduced (both not used currently) while all
existing methods of InternalExecutor interface are implemented
using the new iterator logic.

The communication between the iterator goroutine (the receiver) and the
connExecutor goroutine (the sender) is done via a buffered (of 32 size
in non-test setting) channel. The channel is closed when the
connExecutor goroutine exits its run() loop.

Care needs to be taken when closing the iterator - we need to make sure
to close the stmtBuf (so that there are no more commands for the
connExecutor goroutine to execute) and then we need to unblockingly
drain the channel (since the connExecutor goroutine might be blocked on
adding a row to the channel). After that we have to wait for the
connExecutor goroutine to exit so that we can finish the tracing span.
For convenience purposes, if the iterator is fully exhausted, it will
get closed automatically.

Addresses: #48595.

Release note: None

cockroach-teamcity · 2021-01-23T00:24:22Z

This change is

yuzefovich · 2021-02-08T01:24:40Z

bors r-

I think we can improve the semantics around closing the channel.

craig · 2021-02-08T01:24:41Z

Canceled.

yuzefovich · 2021-02-08T01:29:04Z

@jordanlewis could please take another quick look at the change on when the channel is closed?

yuzefovich · 2021-02-08T02:38:43Z

Added another commit that improves the short-circuiting behavior. PTAL.

yuzefovich

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei and @jordanlewis)

pkg/sql/internal.go, line 343 at r3 (raw file):

	// We also need to exhaust the channel since the connExecutor goroutine
	// might be blocked on sending the row in AddRow().
	// TODO(yuzefovich): at the moment, the connExecutor goroutine will not stop

Had to remove the second commit that tried to address it via deriving a child context before instantiating the connExecutor and canceling it here, in Close(). The problem is that the context cancellation error might be sent on the channel, and I don't think it is possible to distinguish the error between "external" (the user of the iterator API cancels the context) which should be returned and "internal" (caused by us canceling the context). Any ideas on how to address this TODO?

This commit updates the internal executor to operate in a streaming fashion by refactoring its internal logic to implement an iterator pattern. A new method `QueryInternalEx` (and its counterpart `QueryInternal`) is introduced (both not used currently) while all existing methods of `InternalExecutor` interface are implemented using the new iterator logic. The communication between the iterator goroutine (the receiver) and the connExecutor goroutine (the sender) is done via a buffered (of 32 size in non-test setting) channel. The channel is closed when the connExecutor goroutine exits its run() loop. Care needs to be taken when closing the iterator - we need to make sure to close the stmtBuf (so that there are no more commands for the connExecutor goroutine to execute) and then we need to drain the channel (since the connExecutor goroutine might be blocked on adding a row to the channel). After that we have to wait for the connExecutor goroutine to exit so that we can finish the tracing span. For convenience purposes, if the iterator is fully exhausted, it will get closed automatically. Release note: None

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @jordanlewis, and @yuzefovich)

pkg/sql/internal.go, line 766 at r1 (raw file):

But I think it's strictly worse than what we have here in terms of potential for true streaming usage, isn't it?

It's about the latency-to-first-result vs automatic retries tradeoff. Letting the caller specify the buffer size would put it in control of this tradeoff.

Unless we want to expose some kind of contract that forces the users of the internal executor to choose their portal limit, which feels wrong (too complex).

The interface we should be shooting for, in my opinion, should be either the go sql package driver interface, or pgwire. Implementing the streaming in the form of portals would keep the door open to using this internal executor through libpq or go sql. I think this channel-based results communication moves us away from that, but I'm not sure.

jordanlewis

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei and @yuzefovich)

pkg/sql/internal.go, line 766 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

But I think it's strictly worse than what we have here in terms of potential for true streaming usage, isn't it?

It's about the latency-to-first-result vs automatic retries tradeoff. Letting the caller specify the buffer size would put it in control of this tradeoff.

Unless we want to expose some kind of contract that forces the users of the internal executor to choose their portal limit, which feels wrong (too complex).

The interface we should be shooting for, in my opinion, should be either the go sql package driver interface, or pgwire. Implementing the streaming in the form of portals would keep the door open to using this internal executor through libpq or go sql. I think this channel-based results communication moves us away from that, but I'm not sure.

I think your opinion is reasonable. But the burden to do this again is quite high. The proximal goal, now solved, of correcting the unbounded memory usage is by far the worst problem that we've had with the internal executor domain, IMO. I think from a strict priority POV, these other reasonable suggestions can come later.

pkg/sql/internal.go, line 343 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Had to remove the second commit that tried to address it via deriving a child context before instantiating the connExecutor and canceling it here, in Close(). The problem is that the context cancellation error might be sent on the channel, and I don't think it is possible to distinguish the error between "external" (the user of the iterator API cancels the context) which should be returned and "internal" (caused by us canceling the context). Any ideas on how to address this TODO?

Hmm... I guess you could try to wrap the internal version and detect it specially, or the other way around? I don't know. What are the consequences of this TODO? Leaked goroutines are quite bad, will we leak goroutines? Or will execution just take a while to finish in some cases?

I think in general it's not great to have operations that don't cancel when top level contexts are canceled, because users expect that their cancellations trickle all the way down.

I'm okay merging this for now to get this work done. But you should make an issue and plan to do it soon, I think.

yuzefovich

TFTRs!

bors r+

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei and @jordanlewis)

pkg/sql/internal.go, line 343 at r3 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

Hmm... I guess you could try to wrap the internal version and detect it specially, or the other way around? I don't know. What are the consequences of this TODO? Leaked goroutines are quite bad, will we leak goroutines? Or will execution just take a while to finish in some cases?

I think in general it's not great to have operations that don't cancel when top level contexts are canceled, because users expect that their cancellations trickle all the way down.

I'm okay merging this for now to get this work done. But you should make an issue and plan to do it soon, I think.

The consequences are only performance-related.

Imagine that we have a long-running query like SELECT * FROM t executed via the iterator API, but we are only interested in the first few rows. Once the caller satisfies its limit, it'll call iterator.Close() to finish early. However, the query execution (i.e. of ExecStmt command currently being executed by the connExecutor) will not stop right away, still all of the remaining rows will be pushed onto the channel, and the iterator will drop them on the floor here, in Close(). This TODO is about improving this situation so that the connExecutor short-circuited the execution of the current command.

To be clear, this is not about leaked goroutines; and if the caller cancels the context, that will stop the connExecutor goroutine too, so we are listening for top level context cancelation.

I'll file an issue to track addressing this, but I currently don't see an easy way to address this TODO.

craig · 2021-02-10T04:18:53Z

This PR was included in a batch that was canceled, it will be automatically retried

craig · 2021-02-10T06:24:25Z

Build succeeded:

GitHub CI (Cockroach)

101477: sql: fix internal executor when it encounters a retry error r=yuzefovich a=yuzefovich This PR contains several commits that fix a long-standing bug in the internal executor that could make it double count some things (rows or metadata) if an internal retry error occurs. In particular, since at least 21.1 (when we introduced the "streaming" internal executor in #59330) if the IE encounters a retry error _after_ it has communicated some results to the client, it would proceed to retry the corresponding command as if the incomplete execution and the retry error never happened. In other words, it was possible for some rows to be double "counted" (either to be directly included multiple times into the result set or indirectly into the "rows affected" number). This PR fixes the problem by returning the retry error to the client in cases when some actual rows have already been communicated and by resetting the number of "rows affected" when "rewinding the stmt buffer" in order to retry the error transparently. Fixes: #98558. Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>

yuzefovich added the do-not-merge bors won't merge a PR with this label. label Jan 23, 2021

yuzefovich force-pushed the internal-executor branch 7 times, most recently from 2c1ec41 to 1f6cf97 Compare January 23, 2021 05:30

yuzefovich requested a review from a team as a code owner January 23, 2021 05:30

yuzefovich removed the request for review from a team January 23, 2021 05:31

yuzefovich force-pushed the internal-executor branch 6 times, most recently from 9220e4c to c65dd5d Compare January 23, 2021 07:12

yuzefovich requested review from a team and miretskiy and removed request for a team January 23, 2021 07:15

yuzefovich force-pushed the internal-executor branch from 6c8980f to c6d9b34 Compare January 23, 2021 07:18

yuzefovich removed the request for review from miretskiy January 23, 2021 07:20

yuzefovich force-pushed the internal-executor branch 8 times, most recently from d541992 to 2d5b4e6 Compare January 23, 2021 19:58

yuzefovich force-pushed the internal-executor branch 2 times, most recently from 47d6628 to 2a1124a Compare February 8, 2021 01:28

yuzefovich force-pushed the internal-executor branch 2 times, most recently from d3c2f48 to edb89cf Compare February 8, 2021 02:37

yuzefovich force-pushed the internal-executor branch from edb89cf to 921166a Compare February 8, 2021 03:15

yuzefovich commented Feb 8, 2021

View reviewed changes

yuzefovich force-pushed the internal-executor branch from 921166a to 178239e Compare February 8, 2021 03:27

andreimatei reviewed Feb 8, 2021

View reviewed changes

jordanlewis approved these changes Feb 10, 2021

View reviewed changes

yuzefovich commented Feb 10, 2021

View reviewed changes

yuzefovich mentioned this pull request Feb 10, 2021

sql: improve the performance of the internal executor iterator API to short-circuit execution on Close #60306

Closed

craig bot merged commit aaa8c54 into cockroachdb:master Feb 10, 2021

yuzefovich deleted the internal-executor branch February 10, 2021 14:59

angelapwen mentioned this pull request Feb 25, 2021

sql: generator builtin close() method does not surface errors #61123

Open

yuzefovich mentioned this pull request Apr 19, 2022

sql: do not close stmt buffer of internal executor in errCallback #80070

Merged

yuzefovich mentioned this pull request Apr 14, 2023

sql: fix internal executor when it encounters a retry error #101477

Merged

This was referenced Apr 19, 2023

release-23.1: sql: fix internal executor when it encounters a retry error #101857

Merged

release-23.1.0: sql: fix internal executor when it encounters a retry error #101858

Merged

yuzefovich mentioned this pull request May 17, 2023

release-22.2: sql: fix RowsAffected in internal executor when it encounters a retry error #103515

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: make internal executor streaming #59330

sql: make internal executor streaming #59330

yuzefovich commented Jan 23, 2021 •

edited

Loading

cockroach-teamcity commented Jan 23, 2021

yuzefovich commented Feb 8, 2021

craig bot commented Feb 8, 2021

yuzefovich commented Feb 8, 2021

yuzefovich commented Feb 8, 2021

yuzefovich left a comment

andreimatei left a comment

jordanlewis left a comment

yuzefovich left a comment

craig bot commented Feb 10, 2021

craig bot commented Feb 10, 2021

sql: make internal executor streaming #59330

sql: make internal executor streaming #59330

Conversation

yuzefovich commented Jan 23, 2021 • edited Loading

cockroach-teamcity commented Jan 23, 2021

yuzefovich commented Feb 8, 2021

craig bot commented Feb 8, 2021

yuzefovich commented Feb 8, 2021

yuzefovich commented Feb 8, 2021

yuzefovich left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

jordanlewis left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

craig bot commented Feb 10, 2021

craig bot commented Feb 10, 2021

yuzefovich commented Jan 23, 2021 •

edited

Loading