repartition-based fallback for hash aggregate v3 #11712

binmahone · 2024-11-08T07:20:58Z

This PR replaces #11116, since there has been too many differences with #11116.

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

…ition_agg

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

…tor leak Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

revans2

I have not finished yet. Could you post an explanation of the changes? I see places in the code that appear to have duplicate functionality. Not to mention we have the old sort based agg code completely duplicating a lot of the newer hash re-partition based code.

I really just want to understand what the work flow is supposed to be?

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuColumnarBatchSerializer.scala

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuSemaphore.scala

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala

sql-plugin/src/main/scala/com/nvidia/spark/rapids/SpillableColumnarBatch.scala

sql-plugin/src/main/scala/com/nvidia/spark/rapids/SortBasedGpuAggregateExec.scala

revans2 · 2024-11-08T15:37:09Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuAggregateExec.scala

@@ -335,7 +513,10 @@ class AggHelper(
    // We need to merge the aggregated batches into 1 before calling post process,
    // if the aggregate code had to split on a retry
    if (aggregatedSeq.size > 1) {
-      val concatted = concatenateBatches(metrics, aggregatedSeq)
+      val concatted =
+        withResource(aggregatedSeq) { _ =>


I am confused by this? Was this a bug? This change feels wrong to me.

concatenateBatches has the contract that it will either close everything in toConcat, or if there is a single item in the sequence it will just return it without closing anything. By putting it within a withResource it looks like we are going to double close the data in aggregatedSeq.

SpillableColumnarBatch has the nasty habit (?) of hiding double closes from us (https://github.com/NVIDIA/spark-rapids/blob/branch-24.12/sql-plugin/src/main/scala/com/nvidia/spark/rapids/SpillableColumnarBatch.scala#L137). I'd like to remove this behavior with my spillable changes.

I think I was mislead by

spark-rapids/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuAggregateExec.scala

Line 955 in 5be4bd5

val concatBatch = withResource(batches) { _ =>

on main branch, and thought concatenateBatches will not close input batches. Will revert this part

abellina · 2024-11-08T15:52:29Z

I will review this today

revans2 · 2024-11-08T20:43:44Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuAggregateExec.scala

+          realIter = Some(ConcatIterator.apply(firstPassIter,
+            (aggOutputSizeRatio * configuredTargetBatchSize).toLong
+          ))
+          firstPassAggToggle.set(false)


I don't think that this does what you think that it does. The line that reads this is used to create an iterator. It is not within an iterator which decides if we should or should not do the agg. I added in some print statements and I have verified that it does indeed agg for every batch, even if the first batch set this to false. Which is a good thing because if you disabled the initial aggregation on something where the output types do not match the input types you would get a crash or data corruption.

Known issue, will revert this part.

I remember getting a crash or data corruption when I tried to fix the iterator bug here. Do you think it's beneficial if we do convert the output types, but not try any row-wise aggregate if heuristics show that first pass agg does not agg out a lot rows?

revans2 · 2024-11-08T21:32:59Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuAggregateExec.scala

+
+      // Handle the case of skipping second and third pass of aggregation
+      // This only work when spark.rapids.sql.agg.skipAggPassReductionRatio < 1
+      if (!firstBatchChecked && firstPassIter.hasNext


If we are doing an aggregate every time, would it be better to check each batch and skip repartitioning if the batch stayed large?

I don't quite understand this. can you elaborate on this?

This is just a follow on issue. Right now we check the first batch and decide for the entire task if we want to skip full aggregation or not. I think it would probably be better if we did it for each batch individually. We could even do it for each batch when we merge them too.

ok, I can open a follow up issue on this. The question is, if we have already skipped agg for first batch, what should we do when second batch does not meet skip criterial? The first batch has already been transferred to next operator by now.

My thought was we keep trying to aggregate it more. Just like the skipped batches never existed. I am not 100% sure on this, we need to do some experiments to be sure. I also thought we should probably do the same thing at all levels of aggregation. If we do a merge aggregation and it does not make the result smaller by enough, then we can release that merged batch instantly.

issue opened at #11770

binmahone · 2024-11-10T22:19:05Z

Hi @revans2 and @abellina, this PR, as I stated in Slack and marked in itself, is still in DRAFT, so I didn't expect a thorough review on this before I change it from Draft to Ready. (Or we wee already have some special usage of Draft PR in our dev process?) Some major problems such as mixing other features(e.g. so called voluntary release check), not working implementation of skipping first iteration agg, etc. The refinement is not ready last Friday but it is now. Please go ahead and review.

The reason why I showed you this PR is for proving "That we do a complete pass over all of the batches and cache them into GPU memory before we make a decision on when and how to release a batch" is no longer true. I assume having access to the state-of-art implementation of agg in our customer may help you better understand the problem we're facing now.

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuAggregateExec.scala

binmahone · 2024-11-11T03:40:52Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuAggregateExec.scala

@@ -2113,6 +2187,7 @@ class DynamicGpuPartialSortAggregateIterator(
      }
      val newIter = if (doSinglePassAgg) {
        metrics.singlePassTasks += 1
+        // TO discuss in PR: is singlePassSortedAgg still necessary?


should we remove this?

I can come up with contrived situations where the sort is better, but in general I think that the skipAggPassReductionRatio is good enough. Although this sort is on by default where as the skipAggPassReductionRatio is off by default. I think we need to have a follow on issue to explore what is the correct heuristic to use and if any of the contrived cases are enough to keep this extra code around.

I remember you said the essence of singlePassSortedAgg is "The heuristic is simply looking at the cost of sorting now vs sorting later." (in here).

Let's consider a case where before agg is N rows and after agg 0.89N rows, in such cases skip agg will not be enabled, and sorting later (it will be "repartitioning later" with this PR checked in) may still be more costly than sorting now. So I don't really understand why skipAggPassReductionRatio could deprecate singlePassSortedAgg.

My original intention of the question is to ask if we should use some kind of "singlePassRepartitionAgg" to replace "singlePassSortedAgg"

binmahone · 2024-11-11T03:47:04Z

Next steps for this PR will be:

Addressing review comments. When reviewing please aware that we found repartition-based agg is more prone to "Container OOM (then get killed by K8S)" than sort based. I'm troubleshooting on my side, but any insight from the reviewer would be a great help.
We have selected four representative queries at our customer, and before/after result is being collected. Part of the collected results has already shown improvements.
I'll run regression test on NDS to ensure no/minor regression.

binmahone · 2024-11-20T05:14:58Z

NDS regression test result on Spark2a (default 3K dataset):

Regression alerts
-----------------

Speedup results
-----------------
query1: Previous (1809.3333333333333 ms) vs Current (1895.0 ms) Diff -85 E2E 0.95x
query2: Previous (1640.6666666666667 ms) vs Current (2042.0 ms) Diff -401 E2E 0.80x
query3: Previous (565.3333333333334 ms) vs Current (567.0 ms) Diff -1 E2E 1.00x
query4: Previous (11023.333333333334 ms) vs Current (11092.666666666666 ms) Diff -69 E2E 0.99x
query5: Previous (2253.6666666666665 ms) vs Current (2584.3333333333335 ms) Diff -330 E2E 0.87x
query6: Previous (778.3333333333334 ms) vs Current (821.6666666666666 ms) Diff -43 E2E 0.95x
query7: Previous (3349.0 ms) vs Current (3399.6666666666665 ms) Diff -50 E2E 0.99x
query8: Previous (1160.6666666666667 ms) vs Current (1158.0 ms) Diff 2 E2E 1.00x
query9: Previous (9345.333333333334 ms) vs Current (5375.666666666667 ms) Diff 3969 E2E 1.74x
query10: Previous (1755.6666666666667 ms) vs Current (1872.0 ms) Diff -116 E2E 0.94x
query11: Previous (5698.0 ms) vs Current (5664.333333333333 ms) Diff 33 E2E 1.01x
query12: Previous (684.3333333333334 ms) vs Current (645.6666666666666 ms) Diff 38 E2E 1.06x
query13: Previous (1791.6666666666667 ms) vs Current (1822.3333333333333 ms) Diff -30 E2E 0.98x
query14_part1: Previous (7540.666666666667 ms) vs Current (8108.333333333333 ms) Diff -567 E2E 0.93x
query14_part2: Previous (6580.0 ms) vs Current (6696.0 ms) Diff -116 E2E 0.98x
query15: Previous (1225.3333333333333 ms) vs Current (1218.0 ms) Diff 7 E2E 1.01x
query16: Previous (4012.3333333333335 ms) vs Current (4044.3333333333335 ms) Diff -32 E2E 0.99x
query17: Previous (2047.3333333333333 ms) vs Current (2044.6666666666667 ms) Diff 2 E2E 1.00x
query18: Previous (2143.0 ms) vs Current (2340.0 ms) Diff -197 E2E 0.92x
query19: Previous (1456.6666666666667 ms) vs Current (1411.6666666666667 ms) Diff 45 E2E 1.03x
query20: Previous (706.6666666666666 ms) vs Current (731.3333333333334 ms) Diff -24 E2E 0.97x
query21: Previous (755.0 ms) vs Current (714.3333333333334 ms) Diff 40 E2E 1.06x
query22: Previous (1390.6666666666667 ms) vs Current (1477.0 ms) Diff -86 E2E 0.94x
query23_part1: Previous (13383.666666666666 ms) vs Current (14087.0 ms) Diff -703 E2E 0.95x
query23_part2: Previous (17956.666666666668 ms) vs Current (18554.333333333332 ms) Diff -597 E2E 0.97x
query24_part1: Previous (9003.0 ms) vs Current (9031.0 ms) Diff -28 E2E 1.00x
query24_part2: Previous (8799.666666666666 ms) vs Current (9124.0 ms) Diff -324 E2E 0.96x
query25: Previous (1918.6666666666667 ms) vs Current (2010.6666666666667 ms) Diff -92 E2E 0.95x
query26: Previous (1468.0 ms) vs Current (1388.3333333333333 ms) Diff 79 E2E 1.06x
query27: Previous (1416.6666666666667 ms) vs Current (1538.3333333333333 ms) Diff -121 E2E 0.92x
query28: Previous (7904.333333333333 ms) vs Current (7976.0 ms) Diff -71 E2E 0.99x
query29: Previous (2934.3333333333335 ms) vs Current (3454.0 ms) Diff -519 E2E 0.85x
query30: Previous (2604.6666666666665 ms) vs Current (2329.6666666666665 ms) Diff 275 E2E 1.12x
query31: Previous (2442.3333333333335 ms) vs Current (2420.6666666666665 ms) Diff 21 E2E 1.01x
query32: Previous (1191.0 ms) vs Current (948.0 ms) Diff 243 E2E 1.26x
query33: Previous (1175.6666666666667 ms) vs Current (1197.3333333333333 ms) Diff -21 E2E 0.98x
query34: Previous (2532.6666666666665 ms) vs Current (2481.6666666666665 ms) Diff 51 E2E 1.02x
query35: Previous (2159.3333333333335 ms) vs Current (2045.0 ms) Diff 114 E2E 1.06x
query36: Previous (1527.0 ms) vs Current (1550.3333333333333 ms) Diff -23 E2E 0.98x
query37: Previous (1414.0 ms) vs Current (1369.6666666666667 ms) Diff 44 E2E 1.03x
query38: Previous (2735.0 ms) vs Current (2708.6666666666665 ms) Diff 26 E2E 1.01x
query39_part1: Previous (2164.0 ms) vs Current (2088.3333333333335 ms) Diff 75 E2E 1.04x
query39_part2: Previous (1595.0 ms) vs Current (1616.3333333333333 ms) Diff -21 E2E 0.99x
query40: Previous (1349.3333333333333 ms) vs Current (1389.0 ms) Diff -39 E2E 0.97x
query41: Previous (401.6666666666667 ms) vs Current (381.3333333333333 ms) Diff 20 E2E 1.05x
query42: Previous (457.6666666666667 ms) vs Current (398.3333333333333 ms) Diff 59 E2E 1.15x
query43: Previous (1125.3333333333333 ms) vs Current (1153.0 ms) Diff -27 E2E 0.98x
query44: Previous (828.6666666666666 ms) vs Current (815.3333333333334 ms) Diff 13 E2E 1.02x
query45: Previous (1332.0 ms) vs Current (1333.0 ms) Diff -1 E2E 1.00x
query46: Previous (1828.6666666666667 ms) vs Current (1610.0 ms) Diff 218 E2E 1.14x
query47: Previous (2201.3333333333335 ms) vs Current (2231.6666666666665 ms) Diff -30 E2E 0.99x
query48: Previous (1289.0 ms) vs Current (1366.3333333333333 ms) Diff -77 E2E 0.94x
query49: Previous (2375.0 ms) vs Current (2423.6666666666665 ms) Diff -48 E2E 0.98x
query50: Previous (9228.333333333334 ms) vs Current (9034.666666666666 ms) Diff 193 E2E 1.02x
query51: Previous (2085.6666666666665 ms) vs Current (2329.0 ms) Diff -243 E2E 0.90x
query52: Previous (591.3333333333334 ms) vs Current (610.3333333333334 ms) Diff -19 E2E 0.97x
query53: Previous (887.0 ms) vs Current (862.6666666666666 ms) Diff 24 E2E 1.03x
query54: Previous (1849.3333333333333 ms) vs Current (1813.0 ms) Diff 36 E2E 1.02x
query55: Previous (518.6666666666666 ms) vs Current (497.6666666666667 ms) Diff 20 E2E 1.04x
query56: Previous (1090.6666666666667 ms) vs Current (1085.6666666666667 ms) Diff 5 E2E 1.00x
query57: Previous (2208.3333333333335 ms) vs Current (1708.3333333333333 ms) Diff 500 E2E 1.29x
query58: Previous (1030.6666666666667 ms) vs Current (1115.3333333333333 ms) Diff -84 E2E 0.92x
query59: Previous (2516.3333333333335 ms) vs Current (2624.0 ms) Diff -107 E2E 0.96x
query60: Previous (1354.0 ms) vs Current (1294.6666666666667 ms) Diff 59 E2E 1.05x
query61: Previous (1527.3333333333333 ms) vs Current (1408.0 ms) Diff 119 E2E 1.08x
query62: Previous (1490.6666666666667 ms) vs Current (1507.3333333333333 ms) Diff -16 E2E 0.99x
query63: Previous (998.0 ms) vs Current (999.3333333333334 ms) Diff -1 E2E 1.00x
query64: Previous (16576.333333333332 ms) vs Current (16515.333333333332 ms) Diff 61 E2E 1.00x
query65: Previous (3914.6666666666665 ms) vs Current (3947.0 ms) Diff -32 E2E 0.99x
query66: Previous (3383.3333333333335 ms) vs Current (3360.0 ms) Diff 23 E2E 1.01x
query67: Previous (28848.0 ms) vs Current (28505.0 ms) Diff 343 E2E 1.01x
query68: Previous (1472.0 ms) vs Current (1400.0 ms) Diff 72 E2E 1.05x
query69: Previous (1523.3333333333333 ms) vs Current (1569.6666666666667 ms) Diff -46 E2E 0.97x
query70: Previous (1876.3333333333333 ms) vs Current (1797.6666666666667 ms) Diff 78 E2E 1.04x
query71: Previous (4076.0 ms) vs Current (4098.666666666667 ms) Diff -22 E2E 0.99x
query72: Previous (3457.6666666666665 ms) vs Current (3291.3333333333335 ms) Diff 166 E2E 1.05x
query73: Previous (1205.0 ms) vs Current (1249.6666666666667 ms) Diff -44 E2E 0.96x
query74: Previous (4539.0 ms) vs Current (4300.666666666667 ms) Diff 238 E2E 1.06x
query75: Previous (8472.0 ms) vs Current (8588.333333333334 ms) Diff -116 E2E 0.99x
query76: Previous (5985.666666666667 ms) vs Current (3096.3333333333335 ms) Diff 2889 E2E 1.93x
query77: Previous (1246.6666666666667 ms) vs Current (1152.6666666666667 ms) Diff 94 E2E 1.08x
query78: Previous (8933.0 ms) vs Current (9010.333333333334 ms) Diff -77 E2E 0.99x
query79: Previous (1256.6666666666667 ms) vs Current (1458.0 ms) Diff -201 E2E 0.86x
query80: Previous (4989.0 ms) vs Current (4801.333333333333 ms) Diff 187 E2E 1.04x
query81: Previous (2466.0 ms) vs Current (2396.6666666666665 ms) Diff 69 E2E 1.03x
query82: Previous (2457.3333333333335 ms) vs Current (2557.0 ms) Diff -99 E2E 0.96x
query83: Previous (835.6666666666666 ms) vs Current (832.6666666666666 ms) Diff 3 E2E 1.00x
query84: Previous (1933.0 ms) vs Current (1569.3333333333333 ms) Diff 363 E2E 1.23x
query85: Previous (1911.6666666666667 ms) vs Current (1751.0 ms) Diff 160 E2E 1.09x
query86: Previous (1103.3333333333333 ms) vs Current (1107.0 ms) Diff -3 E2E 1.00x
query87: Previous (2615.0 ms) vs Current (2565.0 ms) Diff 50 E2E 1.02x
query88: Previous (5482.0 ms) vs Current (5435.0 ms) Diff 47 E2E 1.01x
query89: Previous (1485.0 ms) vs Current (1514.3333333333333 ms) Diff -29 E2E 0.98x
query90: Previous (1197.3333333333333 ms) vs Current (1106.0 ms) Diff 91 E2E 1.08x
query91: Previous (1237.6666666666667 ms) vs Current (1247.0 ms) Diff -9 E2E 0.99x
query92: Previous (725.3333333333334 ms) vs Current (658.0 ms) Diff 67 E2E 1.10x
query93: Previous (10936.333333333334 ms) vs Current (10472.0 ms) Diff 464 E2E 1.04x
query94: Previous (4522.333333333333 ms) vs Current (4524.333333333333 ms) Diff -2 E2E 1.00x
query95: Previous (6579.0 ms) vs Current (6594.0 ms) Diff -15 E2E 1.00x
query96: Previous (5449.0 ms) vs Current (5373.0 ms) Diff 76 E2E 1.01x
query97: Previous (2201.6666666666665 ms) vs Current (2155.3333333333335 ms) Diff 46 E2E 1.02x
query98: Previous (1754.6666666666667 ms) vs Current (1671.3333333333333 ms) Diff 83 E2E 1.05x
query99: Previous (2092.6666666666665 ms) vs Current (2536.3333333333335 ms) Diff -443 E2E 0.83x
benchmark: Previous (356333.3333333333 ms) vs Current (351000.0 ms) Diff 5333 E2E 1.02x

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

binmahone · 2024-11-20T06:07:05Z

build

revans2 · 2024-11-20T14:17:18Z

The premerge tests failed, and there were some memory leaks in there too.

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

…ition_agg_v3

binmahone · 2024-11-25T04:01:12Z

build

binmahone · 2024-11-25T08:58:53Z

hi @revans2 , the premerge test has passed, and the suspious "memory leak" is proven to be irrelavant to this PR. I'll use another thread to talk with you the leak issue.

revans2

This is looking good. I would like to see the one comment removed that was added in so we could discuss the pre-sort code. But beyond that I think we can merge it in.

revans2 · 2024-11-25T15:31:39Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuAggregateExec.scala

+
+      // Handle the case of skipping second and third pass of aggregation
+      // This only work when spark.rapids.sql.agg.skipAggPassReductionRatio < 1
+      if (!firstBatchChecked && firstPassIter.hasNext


My thought was we keep trying to aggregate it more. Just like the skipped batches never existed. I am not 100% sure on this, we need to do some experiments to be sure. I also thought we should probably do the same thing at all levels of aggregation. If we do a merge aggregation and it does not make the result smaller by enough, then we can release that merged batch instantly.

revans2 · 2024-11-25T17:51:28Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuAggregateExec.scala

@@ -2113,6 +2187,7 @@ class DynamicGpuPartialSortAggregateIterator(
      }
      val newIter = if (doSinglePassAgg) {
        metrics.singlePassTasks += 1
+        // TO discuss in PR: is singlePassSortedAgg still necessary?


I can come up with contrived situations where the sort is better, but in general I think that the skipAggPassReductionRatio is good enough. Although this sort is on by default where as the skipAggPassReductionRatio is off by default. I think we need to have a follow on issue to explore what is the correct heuristic to use and if any of the contrived cases are enough to keep this extra code around.

revans2 · 2024-11-25T17:59:30Z

Just for full disclosure I ran some benchmarks to look at the sort based short circuit.

spark.time(spark.range(0, 1000000000L, 1, 2).selectExpr("CAST(id DIV 1 as STRING) as k", "id").groupBy("k").agg(avg(col("id")).alias("id_a"),avg(col("id") + 10).alias("idt_a")).orderBy("k").show())

This I can see as being fairly common. The majority of the keys are unique, and multiple passes don't reduce the size, so we should not worry about it. But the sort optimization on is still better than with it off. At least until we try to set the skipAggPassReductionRatio with a reasonable default value.

DEFAULT
- 119657 ms, 117633 ms, 117105 ms
- spill time seconds-ish: 20, 19, 19
spark.conf.set("spark.rapids.sql.agg.singlePassPartialSortEnabled",false)
- 188092 ms, 182510 ms, 180220 ms
- spill time seconds-ish: 95, 90, 90
spark.conf.set("spark.rapids.sql.agg.skipAggPassReductionRatio", "0.99") (sort still disabled)
- 90188 ms, 89922 ms, 89841 ms
- spill time seconds-ish: 0, 0, 0

spark.time(spark.range(0, 1000000000L, 1, 2).selectExpr("id % 125000000 as k", "id").groupBy("k").agg(avg(col("id")).alias("id_a"),avg(col("id") + 10).alias("idt_a")).orderBy("k").show())

This one is contrived because the keys are not unique, but they are spread evenly throughout a large task. So the fact that we can sort the data (quickly because the key is a single long) and that the spill does not hit disk (because I have 32 GiB of host spill memory), then it pays for itself. I don't see this happening in the real world. Also the size reduction in the shuffle is extra important because I don't have the multi-threaded shuffle enabled for this setup. Like I said it is contrived.

DEFAULT:
- 12361 ms, 12345 ms, 12306 ms
- shuffle-size: 3.3 GiB
- spill time seconds-ish: 0, 0, 0
spark.conf.set("spark.rapids.sql.agg.singlePassPartialSortEnabled",false)
- 60392 ms, 67075 ms, 60071 ms
- shuffle-size: 4.0 GiB
- spill time seconds-ish: 48, 55, 48
spark.conf.set("spark.rapids.sql.agg.skipAggPassReductionRatio", "0.99") (sort still disabled)
- 45085 ms, 44738 ms, 45094 ms
- shuffle-size: 15.4 GiB
- spill time seconds-ish: 0, 0, 0

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

binmahone · 2024-11-26T05:00:50Z

build

binmahone · 2024-11-26T05:15:56Z

This is looking good. I would like to see the one comment removed that was added in so we could discuss the pre-sort code. But beyond that I think we can merge it in.

Hi @revans2 the comment you mentioned is removed now, and I have some newly aded comments, which are all about follow ups. So please feel free to approve the PR if you're okay with it. We can continue the discussion on followups in here.

binmahone added 30 commits July 1, 2024 17:14

workable version without tests

f5d21a9

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

doc

10b7d20

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

fix scala 2.13

4451c54

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

fix compile

4da5797

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

fix it

e803c36

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

enable it

0b50434

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

metric name

a000c9b

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

minor

82cacbf

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

change seed

4cf4a45

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

fix comments

367a273

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

minor

74424b8

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

Merge remote-tracking branch 'origin/branch-24.08' into 240701_repart…

a50261d

…ition_agg

temp

634b01a

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

add log and reverse batch

f1c1c0b

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

runable for optmization mode

603299a

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

clean repartition agg, remove sort agg

e96b2b5

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

stream agg

54cc016

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

suppport skip agg and repartition

a159249

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

clean code

ca0620c

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

fix

4ddbc48

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

remove all logs

97397a4

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

address comments, and remove the concept of fallback

62ff869

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

in mid of shortcircuit for small batches

9e47dfa

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

forbid volunteer release GPU if aggregate exec has more batches to offer

d0e74f3

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

Add companion metrics for all nsTiming metrics

7a69fa6

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

fix conf bug for check volunary release

b4f8272

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

concat output batches, and use CloseableBufferedIterator to fix itera…

053a019

…tor leak Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

Merge remote-tracking branch 'origin/branch-24.10' into agg_v2_patch

4ad5467

Merge branch 'agg_v2_patch' into 240801_repartition_agg_v2

b0cae01

temp, ut can pass

134b094

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

revans2 reviewed Nov 8, 2024

View reviewed changes

binmahone added 2 commits November 11, 2024 09:31

remove unrelated features

34ed835

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

fix review comments

4522380

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

binmahone commented Nov 11, 2024

View reviewed changes

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuAggregateExec.scala Show resolved Hide resolved

binmahone commented Nov 11, 2024

View reviewed changes

fix for scala 2.13

bb8e28d

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

binmahone marked this pull request as ready for review November 20, 2024 10:34

binmahone added 2 commits November 25, 2024 10:54

fix CI

ea2ddcf

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

Merge remote-tracking branch 'origin/branch-24.12' into 240821_repart…

dbf808c

…ition_agg_v3

revans2 reviewed Nov 25, 2024

View reviewed changes

remove commetn

fcdda42

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

binmahone mentioned this pull request Nov 26, 2024

[FEA] decide skip agg or not based on not only first batch #11770

Open

revans2 approved these changes Nov 26, 2024

View reviewed changes

revans2 merged commit 4fa0a1d into NVIDIA:branch-24.12 Nov 26, 2024
49 checks passed

pxLi mentioned this pull request Nov 28, 2024

[BUG] test_hash_* failed "java.util.NoSuchElementException: head of empty list" or "Too many times of repartition, may hit a bug?" #11790

Closed

binmahone mentioned this pull request Nov 29, 2024

[BUG] Fix issue 11790 #11792

Merged

sameerz added the performance A performance related task/issue label Dec 3, 2024

binmahone mentioned this pull request Dec 4, 2024

[FEA] Do a hash based re-partition instead of a sort based fallback for hash aggregate #8391

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

repartition-based fallback for hash aggregate v3 #11712

repartition-based fallback for hash aggregate v3 #11712

binmahone commented Nov 8, 2024 •

edited

Loading

revans2 left a comment

revans2 Nov 8, 2024

abellina Nov 8, 2024

binmahone Nov 11, 2024

abellina commented Nov 8, 2024

revans2 Nov 8, 2024

binmahone Nov 11, 2024

binmahone Nov 11, 2024

revans2 Nov 8, 2024

binmahone Nov 11, 2024

revans2 Nov 20, 2024

binmahone Nov 25, 2024

revans2 Nov 25, 2024

binmahone Nov 26, 2024

binmahone commented Nov 10, 2024 •

edited

Loading

binmahone Nov 11, 2024

revans2 Nov 25, 2024

binmahone Nov 26, 2024 •

edited

Loading

binmahone commented Nov 11, 2024

binmahone commented Nov 20, 2024 •

edited

Loading

binmahone commented Nov 20, 2024

revans2 commented Nov 20, 2024

binmahone commented Nov 25, 2024

binmahone commented Nov 25, 2024

revans2 left a comment

revans2 Nov 25, 2024

revans2 Nov 25, 2024

revans2 commented Nov 25, 2024 •

edited

Loading

binmahone commented Nov 26, 2024

binmahone commented Nov 26, 2024

repartition-based fallback for hash aggregate v3 #11712

repartition-based fallback for hash aggregate v3 #11712

Conversation

binmahone commented Nov 8, 2024 • edited Loading

revans2 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abellina commented Nov 8, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

binmahone commented Nov 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

binmahone Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

binmahone commented Nov 11, 2024

binmahone commented Nov 20, 2024 • edited Loading

binmahone commented Nov 20, 2024

revans2 commented Nov 20, 2024

binmahone commented Nov 25, 2024

binmahone commented Nov 25, 2024

revans2 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

revans2 commented Nov 25, 2024 • edited Loading

binmahone commented Nov 26, 2024

binmahone commented Nov 26, 2024

binmahone commented Nov 8, 2024 •

edited

Loading

binmahone commented Nov 10, 2024 •

edited

Loading

binmahone Nov 26, 2024 •

edited

Loading

binmahone commented Nov 20, 2024 •

edited

Loading

revans2 commented Nov 25, 2024 •

edited

Loading