Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coldata: fix Bytes invariant in some cases #59028

Merged
merged 2 commits into from
Jan 19, 2021

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Jan 15, 2021

execgen: remove SLICE method

execgen.SLICE directive is used only in one place, and we can use
execgen.WINDOW there instead (which will have the same effect).

Release note: None

coldata: fix updating offsets of bytes in Batch.SetLength

In SetLength method we are maintaining the invariant of Bytes
vectors that the offsets are non-decreasing sequences. Previously, this
was done incorrectly when a selection vector is present on the batch
which could lead to out of bounds errors (caught by our panic-catcher)
some time later. This is now fixed by correctly paying attention to the
selection vector.

I neither can easily come up with an example query that would trigger
this condition nor can I prove that it won't occur, but I think we have
seen a single sentry report that could be explained by this bug, so I
think it's worth backporting.

Additionally, this commit uses the assumption that the selection vectors
are increasing sequences in order to calculate the largest index
accessed by the batch.

Fixes: #57297.

Release note (bug fix): Previously, CockroachDB could encounter an
internal error when executing queries with BYTES or STRING types via the
vectorized engine in rare circumstances, and now this is fixed.

@yuzefovich yuzefovich requested review from asubiotto and a team January 15, 2021 02:53
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich yuzefovich added the do-not-merge bors won't merge a PR with this label. label Jan 15, 2021
@yuzefovich yuzefovich removed the do-not-merge bors won't merge a PR with this label. label Jan 15, 2021
@yuzefovich yuzefovich changed the title colexec: fix recent bug in the spilling queue and Bytes invariant in some cases colexec: fix Bytes invariant in some cases Jan 15, 2021
Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You were not able to reproduce this? I assume the unit test is a very specific reproduction, though, right?

Reviewed 4 of 4 files at r1, 4 of 4 files at r2, 1 of 2 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @yuzefovich)


pkg/col/coldata/vec_tmpl.go, line 81 at r2 (raw file):

				// Note that here we rely on the fact that selection vectors are
				// increasing sequences.
				fromCol.UpdateOffsetsToBeNonDecreasing(sel[len(sel)-1] + 1)

Can we have a non-nil zero-length selection vector? Maybe it makes sense to update the earlier if case to len(args.Sel) == 0


pkg/sql/colexec/spilling_queue.go, line 242 at r3 (raw file):

	if q.numInMemoryItems > 0 {
		// If we have already enqueued at least one batch, let's try to copy
		// as many tuples into it as it has the capacity for.

Could you add a unit test that would tickle this bug for regression purposes?

@yuzefovich yuzefovich changed the title colexec: fix Bytes invariant in some cases coldata: fix Bytes invariant in some cases Jan 16, 2021
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm not able to find a repro for the bytes invariant problem (I think it could happen during Vec.Copy with SelOnDest set to true which is used only by the CASE operator, but we don't support CASE with Bytes output type).

Still I cannot persuade myself that the scenario mentioned in the unit test will never occur in the production, and updating the offsets up to the largest index mentioned in the selection vector has always been the intention in SetLength (as evidenced by the similar code in Append when Sel is non-nil), so I believe it is worth merging the fix and - possibly - backporting to previous releases.

Update: actually, this issue might be the root cause of #57297, so I'm even more convinced that we should merge and backport the fix.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)


pkg/col/coldata/vec_tmpl.go, line 81 at r2 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Can we have a non-nil zero-length selection vector? Maybe it makes sense to update the earlier if case to len(args.Sel) == 0

Good point. This currently can never occur because we never attempt to append 0 values. I've updated the contract of Append and left some comments in other places to highlight that.

I slightly prefer doing it this way (other than checking for the case whether we are trying to append 0 values and having some custom behavior there) since it would show up as an internal error to indicate that the assumptions are violated elsewhere. Let me know if you would prefer to be on the safe side.


pkg/sql/colexec/spilling_queue.go, line 242 at r3 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Could you add a unit test that would tickle this bug for regression purposes?

Opened #59077.

`execgen.SLICE` directive is used only in one place, and we can use
`execgen.WINDOW` there instead (which will have the same effect).

Release note: None
Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 4 of 4 files at r4.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto)

In `SetLength` method we are maintaining the invariant of `Bytes`
vectors that the offsets are non-decreasing sequences. Previously, this
was done incorrectly when a selection vector is present on the batch
which could lead to out of bounds errors (caught by our panic-catcher)
some time later. This is now fixed by correctly paying attention to the
selection vector.

I neither can easily come up with an example query that would trigger
this condition nor can I prove that it won't occur, but I think we have
seen a single sentry report that could be explained by this bug, so I
think it's worth backporting.

Additionally, this commit uses the assumption that the selection vectors
are increasing sequences in order to calculate the largest index
accessed by the batch.

Release note (bug fix): Previously, CockroachDB could encounter an
internal error when executing queries with BYTES or STRING types via the
vectorized engine in rare circumstances, and now this is fixed.
@yuzefovich
Copy link
Member Author

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented Jan 19, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

colexec: v20.2.2: index out of bounds when Getting from Bytes vector
3 participants