Add the execution support for segmented aggregation #17618

zacw7 · 2022-04-08T22:32:18Z

Depends on #17458

Benchmark:

Benchmark                                                (operatorType)  (rowsPerSegment)  Mode  Cnt    Score   Error  Units
BenchmarkHashAndSegmentedAggregationOperators.benchmark       segmented                 1  avgt   30  109.391 ± 7.813  ms/op
BenchmarkHashAndSegmentedAggregationOperators.benchmark       segmented                10  avgt   30   52.914 ± 5.455  ms/op
BenchmarkHashAndSegmentedAggregationOperators.benchmark       segmented               800  avgt   30   28.937 ± 5.291  ms/op
BenchmarkHashAndSegmentedAggregationOperators.benchmark       segmented            100000  avgt   30    6.492 ± 0.184  ms/op
BenchmarkHashAndSegmentedAggregationOperators.benchmark            hash                 1  avgt   30   18.439 ± 1.193  ms/op
BenchmarkHashAndSegmentedAggregationOperators.benchmark            hash                10  avgt   30   18.707 ± 2.586  ms/op
BenchmarkHashAndSegmentedAggregationOperators.benchmark            hash               800  avgt   30   17.132 ± 0.495  ms/op
BenchmarkHashAndSegmentedAggregationOperators.benchmark            hash            100000  avgt   30    5.660 ± 0.174  ms/op

Manual testing(717,977,748,003 rows / 3.02 TB):

	QueryID	Splits	Latency	CPU	Memory	Per wall sec
Baseline	20220506_014308_00006_iiy5d	84.9K	36.96 s	23.54 hours	416.32 GB	83.51 GB
File Splittable Disabled	20220506_014507_00008_iiy5d	6.86K	1.50 min	20.49 hours	405.88 GB	34.32 GB
Segmented Aggregation Enabled	20220506_015126_00011_iiy5d	2.06K	1.88 min	33.31 hours	30.60 GB	27.37 GB

Latency regression is observed during testing, which is expected. In order to enable segmented aggregation, splitting files needs to be disabled to preserve the order. As the result, much less splits are generated and it decreased the table scan concurrency drastically especially when there are a lot of big files to scan.

== RELEASE NOTES ==

General Changes
* Add ability to flush the aggregated data when the current input segment is exhausted. It reduces the memory footprint and improves the performance of aggregation when the data is already ordered by a subset of the group-by keys.
This can be enabled with the ``segmented_aggregation_enabled`` session property or the ``optimizer.segmented-aggregation-enabled`` configuration property.

Hive Changes
* Add support for segmented aggregation to reduce the memory footprint and improve query performance when the order-by keys are a subset of the group-by keys. This can be enabled with the ``order_based_execution_enabled`` session property or the ``hive.order-based-execution-enabled`` configuration property.

kewang1024

Haven't finished reviewing the segmented aggregation logic inside HashAggregationOperator, will first post some of the suggestions here

We need some changes to release note to Hive Changes:
there is no segmented aggregation logic added to hive connector, for hive connector, we added the order_based_execution_enabled session property and the hive.order-based-execution-enabled configuration property to disable splitting and expose the data property to leverage segmented aggregation
For refactor commit, one optional suggestion is to add comments in the class to explain the functions of different fields, and a succinct summary of logic within key functions (needsInput, addInput, getOutput, finish etc). I think it would help people understand better when you introduce extra logic for segmented aggregation in the later commit

kewang1024 · 2022-05-10T17:57:56Z

presto-main/src/main/java/com/facebook/presto/sql/planner/LocalExecutionPlanner.java

@@ -3154,7 +3159,7 @@ private OperatorFactory createHashAggregationOperatorFactory(
                    .map(entry -> source.getTypes().get(entry))
                    .collect(toImmutableList());

-            if (isStreamable) {
+            if (isStreamable && preGroupbyVariables.size() == groupbyVariables.size()) {


the size check here is not needed any more
we can pass in isStreamable and isSegmentedAggregationEligible from AggregationNode

The check is not for segmented aggregation but for streaming aggregation right?

You can take a look at the definition of isStreamable, it already includes the logic of checking the size

Or maybe you need to pull the newest code if you can't see it in your local right now

Got it. Thanks for the pointer!

presto-main/src/test/java/com/facebook/presto/operator/OperatorAssertion.java

kewang1024 · 2022-05-10T18:12:48Z

...rc/test/java/com/facebook/presto/operator/BenchmarkHashAndStreamingAggregationOperators.java

+                if (operatorType.equalsIgnoreCase("segmented")) {
+                    operatorFactory = createHashAggregationOperatorFactory(pagesBuilder.getHashChannel(), ImmutableList.of(VARCHAR, BIGINT), ImmutableList.of(0));
+                }
+                else {
+                    operatorFactory = createHashAggregationOperatorFactory(pagesBuilder.getHashChannel(), ImmutableList.of(), ImmutableList.of());
+                }


NIT: We can similarly create another boolean segmentedAggregation

if (!hashAggregation) { } else if (segmentedAggregation) { } else { }

IIUC, segmented aggregation essentially is still hash aggregation - just that in some cases we can flush the output when possible and rebuild the hash. The proposed code structure seems to me like they are mutually exclusive. wdyt?

Mine is the same logic as yours, just less nested logic, it's the same as

if streaming aggregation else if hash aggregation if segmented hash aggregation if normal hash aggregation

kewang1024 · 2022-05-10T18:26:22Z

...rc/test/java/com/facebook/presto/operator/BenchmarkHashAndStreamingAggregationOperators.java

@@ -127,7 +128,12 @@ public void setup()
            pages = pagesBuilder.build();

            if (hashAggregation) {
-                operatorFactory = createHashAggregationOperatorFactory(pagesBuilder.getHashChannel());
+                if (operatorType.equalsIgnoreCase("segmented")) {
+                    operatorFactory = createHashAggregationOperatorFactory(pagesBuilder.getHashChannel(), ImmutableList.of(VARCHAR, BIGINT), ImmutableList.of(0));


For segmented aggregation, this benchmark doesn't seem right because in the createHashAggregationOperatorFactory, we only create a HashAggregationOperatorFactory with groupByChannels as ImmutableList.of(0), in this case, it should actually do streaming aggreagtion

So I think we should create a new Benmark test where we group by at least two fields; and we can test benchmark for hash, segmented and streaming aggregation

Makes sense. Created one comparing hash and segmented aggregation.

kewang1024 · 2022-05-10T18:53:08Z

presto-main/src/test/java/com/facebook/presto/operator/TestHashAggregationOperator.java

@@ -702,6 +727,51 @@ public void testMask()
        assertEquals(outputPage.getBlock(1).getLong(0), 1L);
    }

+    @Test


I would suggest put segmented aggregation's functionality and correctness tests in a separate files, similarly to streaming aggregation: https://github.com/mbasmanova/presto/blob/cc916c4af244cd04d2ef14c3ffad2942a656b431/presto-main/src/test/java/com/facebook/presto/operator/TestStreamingAggregationOperator.java

StreamingAggregationOperator.java is a separate operator while segmented aggregation is part of HashAggregationOperator.

Yeah, the reason why I suggested moving to a new class is that you would have many different cases specific to segmented aggregation that we should test, and moving it to a new class would be clearer IMO

presto-main/src/main/java/com/facebook/presto/operator/HashAggregationOperator.java

zacw7 · 2022-05-17T03:00:15Z

Hey @kewang1024, sorry for the delay. I've addressed most of the comments including creating a new benchmark for hash vs segmented aggregation. For the test for functionality and correctness tests, if a separate file is needed, could you please elaborate more about what other cases should be covered?

kewang1024 · 2022-05-18T05:28:34Z

Hey @kewang1024, sorry for the delay. I've addressed most of the comments including creating a new benchmark for hash vs segmented aggregation. For the test for functionality and correctness tests, if a separate file is needed, could you please elaborate more about what other cases should be covered?

What I have in mind initially is to cover the cases in Streaming Aggregation test

kewang1024

For SegmentedAggregation benchmark, the performance benchmark result shows segmented aggregation is much worse than hash aggregation, we need to have an investigation on what’s causing the performance downgrading

kewang1024 · 2022-05-26T18:00:34Z

presto-main/src/main/java/com/facebook/presto/operator/HashAggregationOperator.java

+                || (finishing && currentPage == null);
+    }
+
+    private boolean isAggregationBuilderFull()


NIT:

can we add comments here explaining this is "partial aggregation reached memory limit" because it's also used in other places as well
I found this function really confusing and need to pick up the context each time I reviewed, it would help understanding the logic

// It would only return true when both of the followings are true: // 1. This is a partial aggregation. // 2. The aggregationBuilder reached memory limit.

Or another option could be changing the function name to partialAggregationReachedMemoryLimit

I agree with you. It took me a while to figure out the logic behind. The new function name makes more sense to me.

kewang1024

For the first PR:
Let's start a new PR for refactor so that we can quickly merge it, I have left all the change suggestion in comments

For the second PR:
Let's also start a new PR and keep this one for old reference.
The design where we flush by segment

It's very costly performance wise, because it closes and rebuilds hashBuilder for each segment especially when one page has multiple segment
It makes the logic too complicated
We're introducing too many variables which makes the HashAggregationOperator logic more error-prone

My suggestion

We set a threshold for segmented aggregation flushing
We process by the minimum unit of Page (same as the current HashAggregationOperator behavior)
It will trigger the flush when threshold limit is hit

Let me know if you want to have VC to discuss

kewang1024 · 2022-06-03T17:22:10Z

presto-main/src/main/java/com/facebook/presto/operator/HashAggregationOperator.java

+    private void initializeAggregationBuilder()
+    {
+        if (aggregationBuilder == null) {
+            if (step.isOutputPartial() || !spillEnabled) {
+                aggregationBuilder = new InMemoryHashAggregationBuilder(
+                        accumulatorFactories,
+                        step,
+                        expectedGroups,
+                        groupByTypes,
+                        groupByChannels,
+                        hashChannel,
+                        operatorContext,
+                        maxPartialMemory,
+                        joinCompiler,
+                        true,
+                        useSystemMemory);
+            }
+            else {
+                verify(!useSystemMemory, "using system memory in spillable aggregations is not supported");
+                aggregationBuilder = new SpillableHashAggregationBuilder(
+                        accumulatorFactories,
+                        step,
+                        expectedGroups,
+                        groupByTypes,
+                        groupByChannels,
+                        hashChannel,
+                        operatorContext,
+                        memoryLimitForMerge,
+                        memoryLimitForMergeWithMemory,
+                        spillerFactory,
+                        joinCompiler);
+            }
+            // assume initial aggregationBuilder is not full
+        }
+        else {
+            checkState(!aggregationBuilder.isFull(), "Aggregation buffer is full");
+        }
+    }
+


NIT: it can be rewritten to a more concise version, also we can change the name to reflect the logic

private void initializeAggregationBuilderIfNeeded() { if (aggregationBuilder != null) { checkState(!aggregationBuilder.isFull(), "Aggregation buffer is full"); return; } if (step.isOutputPartial() || !spillEnabled) { aggregationBuilder = new InMemoryHashAggregationBuilder( accumulatorFactories, step, expectedGroups, groupByTypes, groupByChannels, hashChannel, operatorContext, maxPartialMemory, joinCompiler, true, useSystemMemory); } else { verify(!useSystemMemory, "using system memory in spillable aggregations is not supported"); aggregationBuilder = new SpillableHashAggregationBuilder( accumulatorFactories, step, expectedGroups, groupByTypes, groupByChannels, hashChannel, operatorContext, memoryLimitForMerge, memoryLimitForMergeWithMemory, spillerFactory, joinCompiler); } // assume initial aggregationBuilder is not full }

kewang1024 · 2022-06-03T17:22:58Z

presto-main/src/main/java/com/facebook/presto/operator/HashAggregationOperator.java

+                || (finishing && currentPage == null);
+    }
+
+    private boolean isAggregationBuilderFull() {


move the rename (partialAggregationReachedMemoryLimit) to the refactor commit

kewang1024 · 2022-06-03T17:23:33Z

presto-main/src/main/java/com/facebook/presto/operator/HashAggregationOperator.java

        checkState(!finishing, "Operator is already finishing");
        requireNonNull(page, "page is null");
+        currentPage = requireNonNull(page, "page is null");


We never set currentPage to null in this refactor PR, which makes the logic not correct

Also I would strongly suggest remove the currentPage in refactor PR, this is highly tight to the next implementation of segmented aggregation, and we need to rethink the current segmented aggregation design

kewang1024 · 2022-06-03T17:23:33Z

presto-main/src/main/java/com/facebook/presto/operator/HashAggregationOperator.java

        checkState(!finishing, "Operator is already finishing");
        requireNonNull(page, "page is null");
+        currentPage = requireNonNull(page, "page is null");


We never set currentPage to null in our logic, which makes this refactor PR not correct

kewang1024 · 2022-06-03T17:44:23Z

presto-main/src/main/java/com/facebook/presto/operator/HashAggregationOperator.java

+    // Produce results if one of the following is true:
+    // - partial aggregation reached memory limit.
+    // - received finish() signal and there is no more input remaining to process.
+    private boolean shouldFlush()
+    {
+        return isAggregationBuilderFull()
+                || (finishing && currentPage == null);
+    }


NIT: make the comment more descriptive and adjust the order

// Flush for any of the scenarios below: // - 1. finishing: received finish() signal (no more input to come). // - 2. If this is partial aggregation and it has reached memory limit. private boolean shouldFlush() { return finishing || partialAggregationReachedMemoryLimit(); }

kewang1024 · 2022-06-03T18:12:05Z

presto-main/src/main/java/com/facebook/presto/operator/HashAggregationOperator.java

            return false;
        }
        else {
-            return unfinishedWork == null;
+            return unfinishedWork == null && currentPage == null;
        }
    }


NIT let's also include this change to make this function easier to understand

// This operator only needs input when // - 1. It hasn't received finish() signal (more input to come). // - 2. Page has been processed. // - 3. Aggregation has been processed. // - 4. If this is partial aggregation and it hasn't reached memory limit. @Override public boolean needsInput() { return !finishing && unfinishedWork == null && outputPages == null && partialAggregationReachedMemoryLimit(); }

zacw7 force-pushed the seg-agg branch from 3a1846a to a3677a0 Compare April 12, 2022 01:37

kewang1024 mentioned this pull request Apr 26, 2022

Add segmented aggregation in AggregationNode #17458

Merged

zacw7 force-pushed the seg-agg branch 4 times, most recently from 80a46c3 to 959b1e6 Compare May 2, 2022 20:45

zacw7 force-pushed the seg-agg branch from 959b1e6 to 1c3483d Compare May 5, 2022 04:29

zacw7 changed the title ~~[WIP] Add segmented streaming aggregation support during execution~~ Add the execution support for segmented aggregation May 5, 2022

zacw7 force-pushed the seg-agg branch 4 times, most recently from eb3107e to f59a0fa Compare May 6, 2022 01:20

zacw7 marked this pull request as ready for review May 6, 2022 02:43

zacw7 requested a review from a team as a code owner May 6, 2022 02:43

zacw7 requested review from presto-oss, kewang1024 and yuanzhanhku May 6, 2022 02:43

kewang1024 reviewed May 11, 2022

View reviewed changes

zacw7 force-pushed the seg-agg branch 2 times, most recently from 8378e30 to 709683a Compare May 17, 2022 02:54

zacw7 force-pushed the seg-agg branch 2 times, most recently from e0fa371 to 52f563a Compare May 17, 2022 19:22

kewang1024 self-requested a review May 26, 2022 06:52

kewang1024 reviewed May 26, 2022

View reviewed changes

Refactor HashAggregationOperator

d1bae2b

zacw7 force-pushed the seg-agg branch from 52f563a to a48950a Compare May 31, 2022 20:50

Implement segmented aggregation execution

7beee80

zacw7 force-pushed the seg-agg branch from a48950a to 7beee80 Compare May 31, 2022 20:54

kewang1024 reviewed Jun 3, 2022

View reviewed changes

zacw7 mentioned this pull request Jun 9, 2022

Refactor HashAggregationOperator #17858

Merged

zacw7 closed this Jun 9, 2022

zacw7 deleted the seg-agg branch June 15, 2022 16:48

zacw7 mentioned this pull request Jun 15, 2022

Implement segmented aggregation execution #17886

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the execution support for segmented aggregation #17618

Add the execution support for segmented aggregation #17618

zacw7 commented Apr 8, 2022 •

edited

Loading

kewang1024 left a comment

kewang1024 May 10, 2022

zacw7 May 13, 2022

kewang1024 May 14, 2022

zacw7 May 16, 2022

kewang1024 May 10, 2022

zacw7 May 13, 2022 •

edited

Loading

kewang1024 May 14, 2022 •

edited

Loading

kewang1024 May 10, 2022

zacw7 May 17, 2022

kewang1024 May 10, 2022

zacw7 May 13, 2022

kewang1024 May 14, 2022

zacw7 commented May 17, 2022

kewang1024 commented May 18, 2022

kewang1024 left a comment •

edited

Loading

kewang1024 May 26, 2022

zacw7 May 31, 2022

kewang1024 left a comment •

edited

Loading

kewang1024 Jun 3, 2022

kewang1024 Jun 3, 2022

kewang1024 Jun 3, 2022

kewang1024 Jun 3, 2022

kewang1024 Jun 3, 2022

kewang1024 Jun 3, 2022

Add the execution support for segmented aggregation #17618

Add the execution support for segmented aggregation #17618

Conversation

zacw7 commented Apr 8, 2022 • edited Loading

kewang1024 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zacw7 May 13, 2022 • edited Loading

Choose a reason for hiding this comment

kewang1024 May 14, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zacw7 commented May 17, 2022

kewang1024 commented May 18, 2022

kewang1024 left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kewang1024 left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zacw7 commented Apr 8, 2022 •

edited

Loading

zacw7 May 13, 2022 •

edited

Loading

kewang1024 May 14, 2022 •

edited

Loading

kewang1024 left a comment •

edited

Loading

kewang1024 left a comment •

edited

Loading