exec: overflow handling for vectorized arithmetic #38967

rafiss · 2019-07-18T20:03:31Z

The overflow checks are done as part of the code generation in
overloads.go. The checks are done inline, rather than calling the
functions in the arith package for performance reasons.

The checks are only done for integer math. float math is already
well-defined since overflow will result in +Inf and -Inf as necessary.

The operations that these checks are relevant for are the SUM_INT
aggregator and projection. In the future, AVG will also benefit from
these overflow checks.

This changes the error message produced by overflows in the
non-vectorized SUM_INT aggregator so that the messages are consistent.
This should be fine in terms of postgres-compatibility since SUM_INT is
unique to CRDB and eventually we will get rid of it anyway.

resolves #38775

Release note: None

cockroach-teamcity · 2019-07-18T20:03:40Z

This change is

rafiss · 2019-07-18T20:05:34Z

SUM aggregator results are not too bad overall. Most affected is multiplication projection.

name                                                                              old time/op    new time/op    delta
Aggregator/SUM/ordered/Int64/groupSize=1/hasNulls=false/numInputBatches=64-24        383µs ± 0%     387µs ± 0%   +0.99%  (p=0.000 n=10+10)
Aggregator/SUM/ordered/Int64/groupSize=1/hasNulls=true/numInputBatches=64-24         707µs ± 1%     724µs ± 1%   +2.42%  (p=0.000 n=10+10)
Aggregator/SUM/ordered/Int64/groupSize=2/hasNulls=false/numInputBatches=64-24        309µs ± 2%     309µs ± 1%     ~     (p=0.684 n=10+10)
Aggregator/SUM/ordered/Int64/groupSize=2/hasNulls=true/numInputBatches=64-24         471µs ± 1%     491µs ± 1%   +4.16%  (p=0.000 n=10+10)
Aggregator/SUM/ordered/Int64/groupSize=512/hasNulls=false/numInputBatches=64-24      240µs ± 0%     272µs ± 0%  +13.58%  (p=0.000 n=10+10)
Aggregator/SUM/ordered/Int64/groupSize=512/hasNulls=true/numInputBatches=64-24       346µs ± 0%     382µs ± 0%  +10.49%  (p=0.000 n=10+10)
Aggregator/SUM/ordered/Int64/groupSize=1024/hasNulls=false/numInputBatches=64-24     239µs ± 0%     271µs ± 0%  +13.24%  (p=0.000 n=10+9)
Aggregator/SUM/ordered/Int64/groupSize=1024/hasNulls=true/numInputBatches=64-24      348µs ± 1%     383µs ± 0%  +10.07%  (p=0.000 n=10+10)

name                                                                              old speed      new speed      delta
Aggregator/SUM/ordered/Int64/groupSize=1/hasNulls=false/numInputBatches=64-24     1.37GB/s ± 0%  1.35GB/s ± 0%   -0.98%  (p=0.000 n=10+10)
Aggregator/SUM/ordered/Int64/groupSize=1/hasNulls=true/numInputBatches=64-24       741MB/s ± 1%   724MB/s ± 1%   -2.36%  (p=0.000 n=10+10)
Aggregator/SUM/ordered/Int64/groupSize=2/hasNulls=false/numInputBatches=64-24     1.70GB/s ± 2%  1.70GB/s ± 1%     ~     (p=0.684 n=10+10)
Aggregator/SUM/ordered/Int64/groupSize=2/hasNulls=true/numInputBatches=64-24      1.11GB/s ± 1%  1.07GB/s ± 1%   -3.99%  (p=0.000 n=10+10)
Aggregator/SUM/ordered/Int64/groupSize=512/hasNulls=false/numInputBatches=64-24   2.19GB/s ± 0%  1.93GB/s ± 0%  -11.96%  (p=0.000 n=10+10)
Aggregator/SUM/ordered/Int64/groupSize=512/hasNulls=true/numInputBatches=64-24    1.52GB/s ± 0%  1.37GB/s ± 0%   -9.50%  (p=0.000 n=10+10)
Aggregator/SUM/ordered/Int64/groupSize=1024/hasNulls=false/numInputBatches=64-24  2.19GB/s ± 0%  1.94GB/s ± 0%  -11.69%  (p=0.000 n=10+9)
Aggregator/SUM/ordered/Int64/groupSize=1024/hasNulls=true/numInputBatches=64-24   1.51GB/s ± 1%  1.37GB/s ± 0%   -9.15%  (p=0.000 n=10+10)

ProjOp/op=projMinusInt64Int64Op/useSel=true/hasNulls=true-24      1.34µs ± 0%    1.89µs ± 1%    +40.64%  (p=0.000 n=10+9)
ProjOp/op=projMinusInt64Int64Op/useSel=true/hasNulls=false-24      957ns ± 0%    1468ns ± 2%    +53.48%  (p=0.000 n=9+10)
ProjOp/op=projMinusInt64Int64Op/useSel=false/hasNulls=true-24     1.01µs ± 0%    1.58µs ± 1%    +56.61%  (p=0.000 n=8+10)
ProjOp/op=projMinusInt64Int64Op/useSel=false/hasNulls=false-24     646ns ± 0%    1212ns ± 3%    +87.69%  (p=0.000 n=8+10)
ProjOp/op=projMultInt64Int64Op/useSel=true/hasNulls=true-24       1.35µs ± 0%   13.31µs ± 0%   +884.40%  (p=0.000 n=8+10)
ProjOp/op=projMultInt64Int64Op/useSel=true/hasNulls=false-24       958ns ± 0%   12831ns ± 0%  +1239.87%  (p=0.000 n=10+10)
ProjOp/op=projMultInt64Int64Op/useSel=false/hasNulls=true-24      1.02µs ± 1%   14.44µs ± 1%  +1309.27%  (p=0.000 n=10+10)
ProjOp/op=projMultInt64Int64Op/useSel=false/hasNulls=false-24      646ns ± 1%   13970ns ± 1%  +2061.84%  (p=0.000 n=10+10)
ProjOp/op=projDivInt64Int64Op/useSel=true/hasNulls=true-24        12.0µs ± 0%    12.5µs ± 2%     +3.88%  (p=0.000 n=9+10)
ProjOp/op=projDivInt64Int64Op/useSel=true/hasNulls=false-24       11.5µs ± 0%    11.9µs ± 1%     +3.16%  (p=0.000 n=9+10)
ProjOp/op=projDivInt64Int64Op/useSel=false/hasNulls=true-24       12.1µs ± 0%    12.7µs ± 0%     +5.01%  (p=0.000 n=10+10)
ProjOp/op=projDivInt64Int64Op/useSel=false/hasNulls=false-24      11.7µs ± 0%    12.3µs ± 1%     +4.71%  (p=0.000 n=8+9)
ProjOp/op=projPlusInt64Int64Op/useSel=true/hasNulls=true-24       1.35µs ± 1%    1.83µs ± 0%    +35.43%  (p=0.000 n=10+10)
ProjOp/op=projPlusInt64Int64Op/useSel=true/hasNulls=false-24       966ns ± 1%    1413ns ± 0%    +46.24%  (p=0.000 n=10+9)
ProjOp/op=projPlusInt64Int64Op/useSel=false/hasNulls=true-24      1.17µs ± 1%    1.57µs ± 2%    +33.99%  (p=0.000 n=10+10)
ProjOp/op=projPlusInt64Int64Op/useSel=false/hasNulls=false-24      819ns ± 2%    1220ns ± 0%    +48.93%  (p=0.000 n=10+8)

name                                                            old speed      new speed      delta
ProjOp/op=projMinusInt64Int64Op/useSel=true/hasNulls=true-24    12.2GB/s ± 0%   8.7GB/s ± 1%    -28.88%  (p=0.000 n=10+9)
ProjOp/op=projMinusInt64Int64Op/useSel=true/hasNulls=false-24   17.1GB/s ± 0%  11.2GB/s ± 2%    -34.83%  (p=0.000 n=9+10)
ProjOp/op=projMinusInt64Int64Op/useSel=false/hasNulls=true-24   16.2GB/s ± 0%  10.3GB/s ± 1%    -36.08%  (p=0.000 n=10+10)
ProjOp/op=projMinusInt64Int64Op/useSel=false/hasNulls=false-24  25.3GB/s ± 0%  13.5GB/s ± 3%    -46.70%  (p=0.000 n=10+10)
ProjOp/op=projMultInt64Int64Op/useSel=true/hasNulls=true-24     12.1GB/s ± 0%   1.2GB/s ± 0%    -89.83%  (p=0.000 n=9+10)
ProjOp/op=projMultInt64Int64Op/useSel=true/hasNulls=false-24    17.1GB/s ± 0%   1.3GB/s ± 0%    -92.53%  (p=0.000 n=10+10)
ProjOp/op=projMultInt64Int64Op/useSel=false/hasNulls=true-24    16.0GB/s ± 1%   1.1GB/s ± 1%    -92.90%  (p=0.000 n=10+10)
ProjOp/op=projMultInt64Int64Op/useSel=false/hasNulls=false-24   25.3GB/s ± 1%   1.2GB/s ± 1%    -95.37%  (p=0.000 n=10+10)
ProjOp/op=projDivInt64Int64Op/useSel=true/hasNulls=true-24      1.37GB/s ± 0%  1.31GB/s ± 2%     -3.71%  (p=0.000 n=9+10)
ProjOp/op=projDivInt64Int64Op/useSel=true/hasNulls=false-24     1.42GB/s ± 0%  1.38GB/s ± 1%     -3.06%  (p=0.000 n=9+10)
ProjOp/op=projDivInt64Int64Op/useSel=false/hasNulls=true-24     1.35GB/s ± 0%  1.29GB/s ± 0%     -4.77%  (p=0.000 n=10+10)
ProjOp/op=projDivInt64Int64Op/useSel=false/hasNulls=false-24    1.40GB/s ± 0%  1.34GB/s ± 1%     -4.49%  (p=0.000 n=8+9)
ProjOp/op=projPlusInt64Int64Op/useSel=true/hasNulls=true-24     12.1GB/s ± 1%   9.0GB/s ± 0%    -26.15%  (p=0.000 n=10+10)
ProjOp/op=projPlusInt64Int64Op/useSel=true/hasNulls=false-24    16.9GB/s ± 1%  11.6GB/s ± 0%    -31.59%  (p=0.000 n=10+9)
ProjOp/op=projPlusInt64Int64Op/useSel=false/hasNulls=true-24    13.9GB/s ± 1%  10.4GB/s ± 2%    -25.36%  (p=0.000 n=10+10)
ProjOp/op=projPlusInt64Int64Op/useSel=false/hasNulls=false-24   20.0GB/s ± 2%  13.4GB/s ± 0%    -32.85%  (p=0.000 n=10+8)

solongordon · 2019-07-18T21:15:04Z

Similar to my suggestion from yesterday, I bet we can make multiplication much faster in the typical case by skipping the overflow check when the ints are sufficiently small. Maybe something clever like checking if int64(int32(x)) == x (in the Int64 case). I don't know if that's exactly right but you get the idea.

petermattis · 2019-07-19T00:38:09Z

I wonder if math/bits.Mul64 and then checking whether the high-bits are non-zero would be faster for multiplication.

rafiss · 2019-07-19T03:05:58Z

Thanks for the pointers, both the casting idea and math/bits.Mul64 sound promising! Mul64 only works with unsigned ints, and there are only 64- and 32-bit variants, but it certainly is worth looking into more. I'll do tests to see if the performance is better if we special case positive ints of that size, and check if the algorithm in that function can be adapted for our use case more generally.

jordanlewis

assuming that test failure is expected.

Reviewed 1 of 1 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @solongordon)

rafiss · 2019-07-19T20:40:42Z

I ended up changing multiplication to do something similar to Solon's idea, but I'm explicitly comparing to the upper and lower bounds so that things are a little more legible. There's a much smaller hit on latency now (although it's still substantial).

name                                                           old time/op    new time/op    delta
ProjOp/op=projMultInt64Int64Op/useSel=true/hasNulls=true-24      1.37µs ± 1%    3.58µs ± 0%  +160.55%  (p=0.000 n=10+10)
ProjOp/op=projMultInt64Int64Op/useSel=true/hasNulls=false-24      976ns ± 1%    3205ns ± 0%  +228.34%  (p=0.000 n=9+8)
ProjOp/op=projMultInt64Int64Op/useSel=false/hasNulls=true-24     1.04µs ± 1%    2.72µs ± 0%  +162.45%  (p=0.000 n=10+9)
ProjOp/op=projMultInt64Int64Op/useSel=false/hasNulls=false-24     662ns ± 1%    2366ns ± 0%  +257.21%  (p=0.000 n=10+10)

name                                                           old speed      new speed      delta
ProjOp/op=projMultInt64Int64Op/useSel=true/hasNulls=true-24    11.9GB/s ± 1%   4.6GB/s ± 0%   -61.61%  (p=0.000 n=10+10)
ProjOp/op=projMultInt64Int64Op/useSel=true/hasNulls=false-24   16.8GB/s ± 1%   5.1GB/s ± 0%   -69.53%  (p=0.000 n=9+8)
ProjOp/op=projMultInt64Int64Op/useSel=false/hasNulls=true-24   15.8GB/s ± 1%   6.0GB/s ± 0%   -61.89%  (p=0.000 n=10+9)
ProjOp/op=projMultInt64Int64Op/useSel=false/hasNulls=false-24  24.7GB/s ± 1%   6.9GB/s ± 0%   -71.99%  (p=0.000 n=10+10)

jordanlewis

, nice. I have a suggestion to make the templates a little more legible. I'm also a little anxious that we don't have particularly great edge case testing of this still, even with the added tests from you and Matt. Is there a way we could write a quickcheck style random test that's specifically just testing edge behavior for this stuff? Or is it overkill?

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @rafiss and @solongordon)

pkg/sql/exec/execgen/cmd/execgen/overloads.go, line 363 at r2 (raw file):

						panic(tree.ErrIntOutOfRange)
					}
					%[1]s = result

Did you find these numbers confusing while writing this code? As a suggestion, it might be easier to read if you used a text template:

m := map[string]interface{}{"Target": target, "L": l, "R":, r}
buf := strings.Builder{}
t := template.Must(template.New("").Parse(`
{
  result := {{.L}} + {{.R}}
...`)
t.Execute(&buf, m)
return buf.Build()

(disclaimer: not tested)

pkg/sql/exec/execgen/cmd/execgen/overloads.go, line 389 at r2 (raw file):

			case 8:
				upperBound = "10"
				lowerBound = "-10"

Do we need the int8 case? Oh right, problem is that the type exists in our package, like you were saying. We should probably just delete that type.

rafiss

I think you're definitely right this isn't super well tested. That failing test in the previous revision of the PR was due to a bug/typo I had in the subtraction template, and we got lucky that there was a test that caught it in an unrelated logic test. I was trying to see if I could unit test this overflow handling since there are a lot of edge cases to check, but maybe I should just go ahead and write logic tests for all of the cases.

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @solongordon)

pkg/sql/exec/execgen/cmd/execgen/overloads.go, line 363 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

Did you find these numbers confusing while writing this code? As a suggestion, it might be easier to read if you used a text template:
m := map[string]interface{}{"Target": target, "L": l, "R":, r}
buf := strings.Builder{}
t := template.Must(template.New("").Parse(`
{
  result := {{.L}} + {{.R}}
...`)
t.Execute(&buf, m)
return buf.Build()
(disclaimer: not tested)

i'll look into this idea. I did find the %[1]s syntax annoying; i wish there were named format strings like in python

pkg/sql/exec/execgen/cmd/execgen/overloads.go, line 389 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

Do we need the int8 case? Oh right, problem is that the type exists in our package, like you were saying. We should probably just delete that type.

yeah, would make sense to remove int8 later

jordanlewis · 2019-07-22T15:02:55Z

Writing unit tests seems like the way to go to me. Logic tests are very "bulky" in the sense that they take a lot of lines to do relatively little work. I think a unit test could probably manage to test quite a bit of these edge cases without as much typing.

Release note: None

rafiss · 2019-07-23T16:20:58Z

I updated the PR so that the overloads use templates with named arguments, and with a more robust unit test suite for all the overflow checks. Please look again :)

jordanlewis

The third is the charm. Excellent work!

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @rafiss and @solongordon)

pkg/sql/exec/overloads_test.go, line 122 at r4 (raw file):

}

func assertIntegerEquals(t *testing.T, expected, actual int) {

You can use the testify package for these helpers if you want. assert.Equal and assert.PanicsWithValue.

pkg/sql/exec/execgen/cmd/execgen/overloads.go, line 414 at r4 (raw file):

				{
					result := {{.Left}} * {{.Right}}
					if {{.Left}} > {{.UpperBound}} || {{.Left}} < {{.LowerBound}} || {{.Right}} > {{.UpperBound}} || {{.Right}} < {{.LowerBound}} {

This is readable! Yay!

rafiss

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis, @rafiss, and @solongordon)

pkg/sql/exec/overloads_test.go, line 122 at r4 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

You can use the testify package for these helpers if you want. assert.Equal and assert.PanicsWithValue.

ah nice, switched to the library

The overflow checks are done as part of the code generation in overloads.go. The checks are done inline, rather than calling the functions in the arith package for performance reasons. The checks are only done for integer math. float math is already well-defined since overflow will result in +Inf and -Inf as necessary. The operations that these checks are relevant for are the SUM_INT aggregator and projection. In the future, AVG will also benefit from these overflow checks. This changes the error message produced by overflows in the non-vectorized SUM_INT aggregator so that the messages are consistent. This should be fine in terms of postgres-compatibility since SUM_INT is unique to CRDB and eventually we will get rid of it anyway. resolves cockroachdb#38775 Release note: None

rafiss · 2019-07-23T17:03:24Z

thanks for helping me iterate!

bors r+

38967: exec: overflow handling for vectorized arithmetic r=rafiss a=rafiss The overflow checks are done as part of the code generation in overloads.go. The checks are done inline, rather than calling the functions in the arith package for performance reasons. The checks are only done for integer math. float math is already well-defined since overflow will result in +Inf and -Inf as necessary. The operations that these checks are relevant for are the SUM_INT aggregator and projection. In the future, AVG will also benefit from these overflow checks. This changes the error message produced by overflows in the non-vectorized SUM_INT aggregator so that the messages are consistent. This should be fine in terms of postgres-compatibility since SUM_INT is unique to CRDB and eventually we will get rid of it anyway. resolves #38775 Release note: None Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>

craig · 2019-07-23T17:30:54Z

Build succeeded

GitHub CI (Cockroach)

rafiss requested review from solongordon and a team July 18, 2019 20:03

rafiss force-pushed the aggregator-overflow-handling branch from 6956af2 to 9653a99 Compare July 18, 2019 21:48

jordanlewis approved these changes Jul 19, 2019

View reviewed changes

rafiss force-pushed the aggregator-overflow-handling branch from 9653a99 to a11b89f Compare July 19, 2019 20:28

jordanlewis approved these changes Jul 22, 2019

View reviewed changes

rafiss commented Jul 22, 2019

View reviewed changes

solongordon requested a review from a team July 22, 2019 18:27

exec: add benchmarks for more projection ops

6780c1f

Release note: None

rafiss force-pushed the aggregator-overflow-handling branch from a11b89f to d7cff34 Compare July 23, 2019 16:19

rafiss requested a review from a team July 23, 2019 16:19

jordanlewis approved these changes Jul 23, 2019

View reviewed changes

rafiss commented Jul 23, 2019

View reviewed changes

rafiss force-pushed the aggregator-overflow-handling branch from d7cff34 to 1dc97ee Compare July 23, 2019 16:55

craig bot merged commit 1dc97ee into cockroachdb:master Jul 23, 2019

rafiss deleted the aggregator-overflow-handling branch July 23, 2019 17:35

rafiss mentioned this pull request Jul 31, 2019

exec: integer overflow isn't handled in plus or mult projection ops #36691

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exec: overflow handling for vectorized arithmetic #38967

exec: overflow handling for vectorized arithmetic #38967

rafiss commented Jul 18, 2019

cockroach-teamcity commented Jul 18, 2019

rafiss commented Jul 18, 2019

solongordon commented Jul 18, 2019

petermattis commented Jul 19, 2019

rafiss commented Jul 19, 2019

jordanlewis left a comment

rafiss commented Jul 19, 2019 •

edited

Loading

jordanlewis left a comment

rafiss left a comment

jordanlewis commented Jul 22, 2019

rafiss commented Jul 23, 2019

jordanlewis left a comment

rafiss left a comment

rafiss commented Jul 23, 2019

craig bot commented Jul 23, 2019

exec: overflow handling for vectorized arithmetic #38967

exec: overflow handling for vectorized arithmetic #38967

Conversation

rafiss commented Jul 18, 2019

cockroach-teamcity commented Jul 18, 2019

rafiss commented Jul 18, 2019

solongordon commented Jul 18, 2019

petermattis commented Jul 19, 2019

rafiss commented Jul 19, 2019

jordanlewis left a comment

Choose a reason for hiding this comment

rafiss commented Jul 19, 2019 • edited Loading

jordanlewis left a comment

Choose a reason for hiding this comment

rafiss left a comment

Choose a reason for hiding this comment

jordanlewis commented Jul 22, 2019

rafiss commented Jul 23, 2019

jordanlewis left a comment

Choose a reason for hiding this comment

rafiss left a comment

Choose a reason for hiding this comment

rafiss commented Jul 23, 2019

craig bot commented Jul 23, 2019

Build succeeded

rafiss commented Jul 19, 2019 •

edited

Loading