Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: add buffer and scanBuffer nodes #37137

Merged
merged 1 commit into from
May 15, 2019
Merged

Conversation

yuzefovich
Copy link
Member

Adds bufferNode that consumes its input, stores all the rows in a
buffer, and then proceeds on passing the rows through. The buffer
can be iterated over multiple times using scanBuffer node that is
referencing a single bufferNode.

Fixes: #37050.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich yuzefovich changed the title sql: add buffer and scanBufer nodes sql: add buffer and scanBuffer nodes Apr 25, 2019
@yuzefovich
Copy link
Member Author

I left several comments on things I'm not sure about.

Also, I can't seem to find an example of how we unit-test local plan nodes, any pointers?

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review. We don't typically do unit tests for plan nodes, it would be very tedious to set up. We usually test them through logic tests. It's fine to hold off on any tests until we make use of the nodes.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @justinj, @RaduBerinde, and @yuzefovich)


pkg/sql/buffer.go, line 28 at r1 (raw file):

// After the input has been fully consumed, it proceeds on passing the rows
// through. The buffer can be iterated over multiple times.
// TODO(yuzefovich): is this buffering all rows at once desirable?

Yes, it's kind of required in general. Part of the point is that we need to consume all input rows (because the input can be a mutation) even if the upper node doesn't need it.


pkg/sql/buffer.go, line 29 at r1 (raw file):

// through. The buffer can be iterated over multiple times.
// TODO(yuzefovich): is this buffering all rows at once desirable?
// TODO(yuzefovich): current version supports only a single scanBufferNode at a

We will need to be able to have multiple scanBufferNode that acts like independent scans.


pkg/sql/buffer.go, line 52 at r1 (raw file):

func (n *bufferNode) Next(params runParams) (bool, error) {
	if !n.inputDone {

It feels like this should happen in startExec. If the parent node ends up not needing any rows, we still need to execute the encapsulated plan fully.


pkg/sql/buffer.go, line 96 at r1 (raw file):

// referencing. The bufferNode can be iterated over multiple times, however, a
// new scanBufferNode is needed.
type scanBufferNode struct {

Add a comment that scanBufferNode can only start execution after the bufferNode finishes its execution completely.

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, makes sense, thanks.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @justinj, and @RaduBerinde)


pkg/sql/buffer.go, line 28 at r1 (raw file):

Previously, RaduBerinde wrote…

Yes, it's kind of required in general. Part of the point is that we need to consume all input rows (because the input can be a mutation) even if the upper node doesn't need it.

Ok, I see, thanks.


pkg/sql/buffer.go, line 29 at r1 (raw file):

Previously, RaduBerinde wrote…

We will need to be able to have multiple scanBufferNode that acts like independent scans.

Done.


pkg/sql/buffer.go, line 52 at r1 (raw file):

Previously, RaduBerinde wrote…

It feels like this should happen in startExec. If the parent node ends up not needing any rows, we still need to execute the encapsulated plan fully.

But is it safe to be Nexting the input in startExec? My understanding is as follows: say we have the following tree valuesNode -> bufferNode -> insertNode, startExec will be called on insertNode first, then on bufferNode before the call on valuesNode due to depth-first traversal in plan.go/startExec. So if we try buffering all the rows in startExec, we'll be calling Next on valuesNode before it has been started. Possibly, I'm missing something, or there needs to be other changes to accommodate buffering within startExec.


pkg/sql/buffer.go, line 96 at r1 (raw file):

Previously, RaduBerinde wrote…

Add a comment that scanBufferNode can only start execution after the bufferNode finishes its execution completely.

Done.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @justinj, and @yuzefovich)


pkg/sql/buffer.go, line 52 at r1 (raw file):

Previously, yuzefovich wrote…

But is it safe to be Nexting the input in startExec? My understanding is as follows: say we have the following tree valuesNode -> bufferNode -> insertNode, startExec will be called on insertNode first, then on bufferNode before the call on valuesNode due to depth-first traversal in plan.go/startExec. So if we try buffering all the rows in startExec, we'll be calling Next on valuesNode before it has been started. Possibly, I'm missing something, or there needs to be other changes to accommodate buffering within startExec.

I think it's post-order traversal

return n.startExec(params)

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @justinj)


pkg/sql/buffer.go, line 52 at r1 (raw file):

Previously, RaduBerinde wrote…

I think it's post-order traversal

return n.startExec(params)

Done. You're right - I was confused (I was thinking that in the example valuesNode was the root of the tree while it is a leaf and insertNode is the root, right?).

Copy link
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @justinj, and @yuzefovich)


pkg/sql/buffer.go, line 33 at r2 (raw file):

	// TODO(yuzefovich): should the buffer be backed by the disk? If so, the
	// comments about TempStorage suggest that it should be used by DistSQL
	// processors, but this node is local.

I think eventually.


pkg/sql/buffer.go, line 73 at r2 (raw file):

}

// TODO(yuzefovich): does this need to have some special behavior?

This looks fine to me.


pkg/sql/buffer.go, line 103 at r2 (raw file):

}

// Note that scanBufferNode does not close the corresponding to it bufferNode.

nit: this sentence doesn't make too much sense.

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RaduBerinde please let me know if this is good to be merged now (with likely modifications later).

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @justinj)


pkg/sql/buffer.go, line 33 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

I think eventually.

Ok, I'll leave this TODO for now then.


pkg/sql/buffer.go, line 73 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

This looks fine to me.

Ok, thanks.


pkg/sql/buffer.go, line 103 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

nit: this sentence doesn't make too much sense.

Removed.

@RaduBerinde
Copy link
Member

Now that I think of it more, I think we will need both kinds of semantics:

  • a buffer node which passes through all rows immediately. It can assume that the parent will read through all the rows. This will be used as input to mutations.
  • a buffer node which first buffers all rows upfront. This will be used for CTEs. In this case the parent node will not need the rows.TestImportCSVStmt

I am not sure if we want them to be separate nodes or not. It might be cleaner if they're separate, especially since in the second case we don't need it to return the rows.

@RaduBerinde
Copy link
Member

The second case above would be used by a withNode which buffers the left side and then runs the right side. From that standpoint, maybe we don't need a separate kind of buffer node - the withNode could just read all rows from a bufferNode on the left side (and throw them away) before executing the right side.

Adds bufferNode that consumes its input, stores all the rows in a
buffer, and then proceeds on passing the rows through. The buffer
can be iterated over multiple times using scanBuffer node that is
referencing a single bufferNode.

Release note: None
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made adjustments to support the first case, and your point about discarding the rows in the second case makes sense (I think, actually, withNode could be just Nexting the bufferNode without explicitly reading the rows with Values).

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @justinj)

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Thanks!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @justinj)

@yuzefovich
Copy link
Member Author

TFTRs!

bors r+

craig bot pushed a commit that referenced this pull request May 15, 2019
37137: sql: add buffer and scanBuffer nodes r=yuzefovich a=yuzefovich

Adds bufferNode that consumes its input, stores all the rows in a
buffer, and then proceeds on passing the rows through. The buffer
can be iterated over multiple times using scanBuffer node that is
referencing a single bufferNode.

Fixes: #37050.

Release note: None

Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented May 15, 2019

Build succeeded

@craig craig bot merged commit 5c1d0f9 into cockroachdb:master May 15, 2019
@yuzefovich yuzefovich deleted the buffer-node branch May 15, 2019 03:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: implement buffer and scanBuffer planNodes
4 participants