sql: add buffer and scanBuffer nodes #37137

yuzefovich · 2019-04-25T18:57:27Z

Adds bufferNode that consumes its input, stores all the rows in a
buffer, and then proceeds on passing the rows through. The buffer
can be iterated over multiple times using scanBuffer node that is
referencing a single bufferNode.

Fixes: #37050.

Release note: None

cockroach-teamcity · 2019-04-25T18:57:36Z

This change is

yuzefovich · 2019-04-25T18:58:58Z

I left several comments on things I'm not sure about.

Also, I can't seem to find an example of how we unit-test local plan nodes, any pointers?

RaduBerinde

Sorry for the late review. We don't typically do unit tests for plan nodes, it would be very tedious to set up. We usually test them through logic tests. It's fine to hold off on any tests until we make use of the nodes.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @justinj, @RaduBerinde, and @yuzefovich)

pkg/sql/buffer.go, line 28 at r1 (raw file):

// After the input has been fully consumed, it proceeds on passing the rows
// through. The buffer can be iterated over multiple times.
// TODO(yuzefovich): is this buffering all rows at once desirable?

Yes, it's kind of required in general. Part of the point is that we need to consume all input rows (because the input can be a mutation) even if the upper node doesn't need it.

pkg/sql/buffer.go, line 29 at r1 (raw file):

// through. The buffer can be iterated over multiple times.
// TODO(yuzefovich): is this buffering all rows at once desirable?
// TODO(yuzefovich): current version supports only a single scanBufferNode at a

We will need to be able to have multiple scanBufferNode that acts like independent scans.

pkg/sql/buffer.go, line 52 at r1 (raw file):

func (n *bufferNode) Next(params runParams) (bool, error) {
	if !n.inputDone {

It feels like this should happen in startExec. If the parent node ends up not needing any rows, we still need to execute the encapsulated plan fully.

pkg/sql/buffer.go, line 96 at r1 (raw file):

// referencing. The bufferNode can be iterated over multiple times, however, a
// new scanBufferNode is needed.
type scanBufferNode struct {

Add a comment that scanBufferNode can only start execution after the bufferNode finishes its execution completely.

yuzefovich

Ok, makes sense, thanks.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @justinj, and @RaduBerinde)

pkg/sql/buffer.go, line 28 at r1 (raw file):

Previously, RaduBerinde wrote…

Yes, it's kind of required in general. Part of the point is that we need to consume all input rows (because the input can be a mutation) even if the upper node doesn't need it.

Ok, I see, thanks.

pkg/sql/buffer.go, line 29 at r1 (raw file):

Previously, RaduBerinde wrote…

We will need to be able to have multiple scanBufferNode that acts like independent scans.

Done.

pkg/sql/buffer.go, line 52 at r1 (raw file):

Previously, RaduBerinde wrote…

It feels like this should happen in startExec. If the parent node ends up not needing any rows, we still need to execute the encapsulated plan fully.

But is it safe to be Nexting the input in startExec? My understanding is as follows: say we have the following tree valuesNode -> bufferNode -> insertNode, startExec will be called on insertNode first, then on bufferNode before the call on valuesNode due to depth-first traversal in plan.go/startExec. So if we try buffering all the rows in startExec, we'll be calling Next on valuesNode before it has been started. Possibly, I'm missing something, or there needs to be other changes to accommodate buffering within startExec.

pkg/sql/buffer.go, line 96 at r1 (raw file):

Previously, RaduBerinde wrote…

Add a comment that scanBufferNode can only start execution after the bufferNode finishes its execution completely.

Done.

RaduBerinde

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @justinj, and @yuzefovich)

pkg/sql/buffer.go, line 52 at r1 (raw file):

Previously, yuzefovich wrote…

But is it safe to be Nexting the input in startExec? My understanding is as follows: say we have the following tree valuesNode -> bufferNode -> insertNode, startExec will be called on insertNode first, then on bufferNode before the call on valuesNode due to depth-first traversal in plan.go/startExec. So if we try buffering all the rows in startExec, we'll be calling Next on valuesNode before it has been started. Possibly, I'm missing something, or there needs to be other changes to accommodate buffering within startExec.

I think it's post-order traversal

cockroach/pkg/sql/plan.go

Line 480 in 147489f

return n.startExec(params)

yuzefovich

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @justinj)

pkg/sql/buffer.go, line 52 at r1 (raw file):

Previously, RaduBerinde wrote…

I think it's post-order traversal

cockroach/pkg/sql/plan.go

Line 480 in 147489f

return n.startExec(params)

Done. You're right - I was confused (I was thinking that in the example valuesNode was the root of the tree while it is a leaf and insertNode is the root, right?).

jordanlewis

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @justinj, and @yuzefovich)

pkg/sql/buffer.go, line 33 at r2 (raw file):

	// TODO(yuzefovich): should the buffer be backed by the disk? If so, the
	// comments about TempStorage suggest that it should be used by DistSQL
	// processors, but this node is local.

I think eventually.

pkg/sql/buffer.go, line 73 at r2 (raw file):

}

// TODO(yuzefovich): does this need to have some special behavior?

This looks fine to me.

pkg/sql/buffer.go, line 103 at r2 (raw file):

}

// Note that scanBufferNode does not close the corresponding to it bufferNode.

nit: this sentence doesn't make too much sense.

yuzefovich

@RaduBerinde please let me know if this is good to be merged now (with likely modifications later).

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @justinj)

pkg/sql/buffer.go, line 33 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

I think eventually.

Ok, I'll leave this TODO for now then.

pkg/sql/buffer.go, line 73 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

This looks fine to me.

Ok, thanks.

pkg/sql/buffer.go, line 103 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

nit: this sentence doesn't make too much sense.

Removed.

RaduBerinde · 2019-05-14T16:02:56Z

Now that I think of it more, I think we will need both kinds of semantics:

a buffer node which passes through all rows immediately. It can assume that the parent will read through all the rows. This will be used as input to mutations.
a buffer node which first buffers all rows upfront. This will be used for CTEs. In this case the parent node will not need the rows.TestImportCSVStmt

I am not sure if we want them to be separate nodes or not. It might be cleaner if they're separate, especially since in the second case we don't need it to return the rows.

RaduBerinde · 2019-05-14T16:05:20Z

The second case above would be used by a withNode which buffers the left side and then runs the right side. From that standpoint, maybe we don't need a separate kind of buffer node - the withNode could just read all rows from a bufferNode on the left side (and throw them away) before executing the right side.

Adds bufferNode that consumes its input, stores all the rows in a buffer, and then proceeds on passing the rows through. The buffer can be iterated over multiple times using scanBuffer node that is referencing a single bufferNode. Release note: None

yuzefovich

I made adjustments to support the first case, and your point about discarding the rows in the second case makes sense (I think, actually, withNode could be just Nexting the bufferNode without explicitly reading the rows with Values).

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @justinj)

RaduBerinde

Thanks!

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @justinj)

yuzefovich · 2019-05-15T02:44:09Z

TFTRs!

bors r+

37137: sql: add buffer and scanBuffer nodes r=yuzefovich a=yuzefovich Adds bufferNode that consumes its input, stores all the rows in a buffer, and then proceeds on passing the rows through. The buffer can be iterated over multiple times using scanBuffer node that is referencing a single bufferNode. Fixes: #37050. Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>

craig · 2019-05-15T03:08:15Z

Build succeeded

GitHub CI (Cockroach)

yuzefovich requested review from jordanlewis, justinj, RaduBerinde and a team April 25, 2019 18:57

yuzefovich changed the title ~~sql: add buffer and scanBufer nodes~~ sql: add buffer and scanBuffer nodes Apr 25, 2019

yuzefovich force-pushed the buffer-node branch from 53b352b to 17b958d Compare April 25, 2019 18:57

RaduBerinde reviewed Apr 30, 2019

View reviewed changes

yuzefovich force-pushed the buffer-node branch from 17b958d to c0f0ae0 Compare May 1, 2019 18:19

yuzefovich commented May 1, 2019

View reviewed changes

RaduBerinde reviewed May 1, 2019

View reviewed changes

yuzefovich force-pushed the buffer-node branch from c0f0ae0 to c89548c Compare May 1, 2019 22:46

yuzefovich commented May 1, 2019

View reviewed changes

jordanlewis approved these changes May 7, 2019

View reviewed changes

yuzefovich force-pushed the buffer-node branch from c89548c to 41c3928 Compare May 8, 2019 16:47

yuzefovich commented May 8, 2019

View reviewed changes

sql: add buffer and scanBuffer nodes

5c1d0f9

Adds bufferNode that consumes its input, stores all the rows in a buffer, and then proceeds on passing the rows through. The buffer can be iterated over multiple times using scanBuffer node that is referencing a single bufferNode. Release note: None

yuzefovich force-pushed the buffer-node branch from 41c3928 to 5c1d0f9 Compare May 15, 2019 00:09

yuzefovich commented May 15, 2019

View reviewed changes

RaduBerinde approved these changes May 15, 2019

View reviewed changes

craig bot merged commit 5c1d0f9 into cockroachdb:master May 15, 2019

yuzefovich deleted the buffer-node branch May 15, 2019 03:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: add buffer and scanBuffer nodes #37137

sql: add buffer and scanBuffer nodes #37137

yuzefovich commented Apr 25, 2019

cockroach-teamcity commented Apr 25, 2019

yuzefovich commented Apr 25, 2019

RaduBerinde left a comment

yuzefovich left a comment

RaduBerinde left a comment

yuzefovich left a comment

jordanlewis left a comment

yuzefovich left a comment

RaduBerinde commented May 14, 2019

RaduBerinde commented May 14, 2019

yuzefovich left a comment

RaduBerinde left a comment

yuzefovich commented May 15, 2019

craig bot commented May 15, 2019

sql: add buffer and scanBuffer nodes #37137

sql: add buffer and scanBuffer nodes #37137

Conversation

yuzefovich commented Apr 25, 2019

cockroach-teamcity commented Apr 25, 2019

yuzefovich commented Apr 25, 2019

RaduBerinde left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

RaduBerinde left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

jordanlewis left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

RaduBerinde commented May 14, 2019

RaduBerinde commented May 14, 2019

yuzefovich left a comment

Choose a reason for hiding this comment

RaduBerinde left a comment

Choose a reason for hiding this comment

yuzefovich commented May 15, 2019

craig bot commented May 15, 2019

Build succeeded