Implement SumUnboundedToUnboundedFixer #8934

andygrove · 2023-08-04T16:39:08Z

Status

Implement new SumUnboundedToUnboundedFixer
- Integral types
- Floating-point types
- Decimals
Integration test
Configs:
- Something that would let us disable this optimization for floats/doubles because the order of operations would change and could produce different results from one run to the next.

The following requirements from the issue are not implemented yet, and I have filed #8943 for these since they are not specific to sum operations.

A config that would let us disable window operations that require a single batch for all data (any window operations with no partition by and is not batch-able)
A config that would let us disable window operations that require a batch for an entire partition by group. (any window operation with a partition by but is not batch-able)

add comment Signed-off-by: Andy Grove <andygrove@nvidia.com>

abellina · 2023-08-04T19:45:01Z

For my own edification, why do we need the configs for the single-batch windows?

andygrove · 2023-08-04T22:18:16Z

For my own edification, why do we need the configs for the single-batch windows?

I copied the requirements from the issue. I assume we want a way to work around the case where the single-batch does not fit into GPU memory.

andygrove · 2023-08-07T19:11:48Z

build

revans2

The code and tests look good. I just think that we might have some dead code in there and it would be nice to clean it up if that is true.

integration_tests/src/main/python/window_function_test.py

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuWindowExec.scala

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuWindowExpression.scala

revans2 · 2023-08-08T13:45:19Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuWindowExpression.scala

+                previousValue = Some(Scalar.fromDouble(scalar.getDouble + prev.getDouble))
+              case DType.DTypeEnum.DECIMAL32 | DType.DTypeEnum.DECIMAL64 |
+                   DType.DTypeEnum.DECIMAL128 =>
+                val sum = prev.getBigDecimal.add(scalar.getBigDecimal)


Don't we need overflow checking here too?

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuWindowExpression.scala

sameerz · 2023-08-09T01:29:59Z

Do we expect any performance implications due to this change?

revans2 · 2023-08-09T17:24:58Z

Do we expect any performance implications due to this change?

I would expect very little performance change. The main goal is that we don't have to hold all of the data in memory at once so we can spill if needed.

revans2 · 2023-08-11T20:52:24Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuWindowExpression.scala

+                previousValue = Some(Scalar.fromDouble(scalar.getDouble + prev.getDouble))
+              case DType.DTypeEnum.DECIMAL32 | DType.DTypeEnum.DECIMAL64 |
+                   DType.DTypeEnum.DECIMAL128 =>
+                withResource(ColumnVector.fromScalar(scalar, 1)) { scalarCv =>


nit: Could we just do what spark does on the CPU? We already know what the decimal type should be.

The following was copied from Add, and can be slightly modified to work I think.

private lazy val numeric = TypeUtils.getNumeric(dataType, failOnError) checkDecimalOverflow(numeric.plus(input1, input2).asInstanceOf[Decimal], precision, scale) protected def checkDecimalOverflow(value: Decimal, precision: Int, scale: Int): Decimal = { value.toPrecision(precision, scale, Decimal.ROUND_HALF_UP, !failOnError, getContextOrNull()) }

Thanks. I have updated this.

revans2 · 2023-08-14T19:17:13Z

build

This reverts commit 8927411.

This reverts commit 8927411. Signed-off-by: Andy Grove <andygrove@nvidia.com>

…NVIDIA#9072)" This reverts commit 96f7153.

andygrove added 2 commits August 4, 2023 10:27

Implement SumUnboundedToUnboundedFixer

Verified

This commit was signed with the committer’s verified signature.

targos Michaël Zasso

GPG key ID: 770F7A9A5AE15600

Verified
Learn about vigilant mode

eba7dbf

revert some changes

4e503cd

andygrove self-assigned this Aug 4, 2023

--signoff

Verified

This commit was signed with the committer’s verified signature.

BethGriggs Bethany Griggs

GPG key ID: D7062848A1AB005C

Verified
Learn about vigilant mode

cb57ead

add comment Signed-off-by: Andy Grove <andygrove@nvidia.com>

andygrove force-pushed the sum-unbounded branch from c01e711 to cb57ead Compare August 4, 2023 16:41

andygrove added 2 commits August 4, 2023 11:29

add new config ENABLE_UNBOUNDED_OPTIMIZATION_FLOAT

Verified

This commit was signed with the committer’s verified signature.

BethGriggs Bethany Griggs

GPG key ID: D7062848A1AB005C

Verified
Learn about vigilant mode

779718b

fix regression and add a basic test

Verified

This commit was signed with the committer’s verified signature.

BethGriggs Bethany Griggs

GPG key ID: D7062848A1AB005C

Verified
Learn about vigilant mode

4e6477b

andygrove added 2 commits August 4, 2023 13:49

overflow checks

Verified

This commit was signed with the committer’s verified signature.

targos Michaël Zasso

GPG key ID: 770F7A9A5AE15600

Verified
Learn about vigilant mode

aec78a1

better test coverage and handle null case in fixUp

Verified

This commit was signed with the committer’s verified signature.

targos Michaël Zasso

GPG key ID: 770F7A9A5AE15600

Verified
Learn about vigilant mode

70cfc99

andygrove added 8 commits August 7, 2023 10:02

Use correct decimal type based on precision

Verified

This commit was signed with the committer’s verified signature.

targos Michaël Zasso

GPG key ID: 770F7A9A5AE15600

Verified
Learn about vigilant mode

3ced0e8

specify approximate_float

Verified

This commit was signed with the committer’s verified signature.

targos Michaël Zasso

GPG key ID: 770F7A9A5AE15600

Verified
Learn about vigilant mode

f37ec95

use Math not MathUtils

Verified

This commit was signed with the committer’s verified signature.

targos Michaël Zasso

GPG key ID: 770F7A9A5AE15600

Verified
Learn about vigilant mode

c07512e

update config docs

Verified

This commit was signed with the committer’s verified signature.

targos Michaël Zasso

GPG key ID: 770F7A9A5AE15600

Verified
Learn about vigilant mode

e4c262d

update comments

Verified

This commit was signed with the committer’s verified signature.

targos Michaël Zasso

GPG key ID: 770F7A9A5AE15600

Verified
Learn about vigilant mode

6bcad11

resource cleanup and rename test

Verified

This commit was signed with the committer’s verified signature.

targos Michaël Zasso

GPG key ID: 770F7A9A5AE15600

Verified
Learn about vigilant mode

2af8343

resource cleanup

Verified

This commit was signed with the committer’s verified signature.

targos Michaël Zasso

GPG key ID: 770F7A9A5AE15600

Verified
Learn about vigilant mode

7703a0a

resource cleanup

9baf9f8

andygrove changed the title ~~WIP: Implement SumUnboundedToUnboundedFixer~~ Implement SumUnboundedToUnboundedFixer Aug 7, 2023

andygrove marked this pull request as ready for review August 7, 2023 22:15

revans2 previously approved these changes Aug 8, 2023

View reviewed changes

sameerz added the reliability label Aug 9, 2023

partially address feedback

dcc5f46

andygrove dismissed revans2’s stale review via dcc5f46 August 10, 2023 22:48

andygrove added 2 commits August 11, 2023 10:16

implement overflow check

dd5cd41

Rename variable

9c15d5e

andygrove added 2 commits August 11, 2023 11:05

fix resource leak

606b02f

remove unnecessary copy

410cfa1

revans2 previously approved these changes Aug 11, 2023

View reviewed changes

use CPU for decimal sum

032b875

andygrove dismissed revans2’s stale review via 032b875 August 14, 2023 17:56

revans2 approved these changes Aug 14, 2023

View reviewed changes

andygrove merged commit 8927411 into NVIDIA:branch-23.10 Aug 16, 2023

andygrove deleted the sum-unbounded branch August 16, 2023 14:58

andygrove mentioned this pull request Aug 17, 2023

[BUG] test_numeric_running_sum_window_no_part_unbounded failed in MT tests #9071

Closed

andygrove added a commit to andygrove/spark-rapids that referenced this pull request Aug 17, 2023

Revert "Implement SumUnboundedToUnboundedFixer (NVIDIA#8934)"

6da7e65

This reverts commit 8927411.

andygrove added a commit to andygrove/spark-rapids that referenced this pull request Aug 17, 2023

Revert "Implement SumUnboundedToUnboundedFixer (NVIDIA#8934)"

57f5dd5

This reverts commit 8927411. Signed-off-by: Andy Grove <andygrove@nvidia.com>

abellina pushed a commit that referenced this pull request Aug 18, 2023

Revert "Implement SumUnboundedToUnboundedFixer (#8934)" (#9072)

96f7153

This reverts commit 8927411. Signed-off-by: Andy Grove <andygrove@nvidia.com>

andygrove added a commit to andygrove/spark-rapids that referenced this pull request Aug 22, 2023

Revert "Revert "Implement SumUnboundedToUnboundedFixer (NVIDIA#8934)" (…

068b3ff

…NVIDIA#9072)" This reverts commit 96f7153.

andygrove mentioned this pull request Aug 23, 2023

Implement SumUnboundedToUnboundedFixer (second attempt) #9097

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement SumUnboundedToUnboundedFixer #8934

Implement SumUnboundedToUnboundedFixer #8934

andygrove commented Aug 4, 2023 •

edited

Loading

abellina commented Aug 4, 2023

andygrove commented Aug 4, 2023

andygrove commented Aug 7, 2023

revans2 left a comment

revans2 Aug 8, 2023

sameerz commented Aug 9, 2023

revans2 commented Aug 9, 2023

revans2 Aug 11, 2023

andygrove Aug 14, 2023

revans2 commented Aug 14, 2023

Implement SumUnboundedToUnboundedFixer #8934

Implement SumUnboundedToUnboundedFixer #8934

Conversation

andygrove commented Aug 4, 2023 • edited Loading

abellina commented Aug 4, 2023

andygrove commented Aug 4, 2023

andygrove commented Aug 7, 2023

revans2 left a comment

Choose a reason for hiding this comment

revans2 Aug 8, 2023

Choose a reason for hiding this comment

sameerz commented Aug 9, 2023

revans2 commented Aug 9, 2023

revans2 Aug 11, 2023

Choose a reason for hiding this comment

andygrove Aug 14, 2023

Choose a reason for hiding this comment

revans2 commented Aug 14, 2023

andygrove commented Aug 4, 2023 •

edited

Loading