perf: Improve count aggregate performance #784

andygrove · 2024-08-06T01:24:26Z

Which issue does this PR close?

Closes #744

Rationale for this change

For some reason, COUNT is really slow when used from Comet, but SUM is fast, so let's translate COUNT(expr) to SUM(IF(expr IS NULL, 0, 1)) ~~until we can get to the bottom of the real issue~~.

edit: It turns out that Spark also implements COUNT this way, so I think this closes the issue.

Grouped HashAgg Exec: single group key (cardinality 1048576), single aggregate COUNT:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL Parquet - Spark (COUNT)                                                                    1716           1728          17          6.1         163.7       1.0X
SQL Parquet - Comet (Scan) (COUNT)                                                             1677           1680           4          6.3         159.9       1.0X
SQL Parquet - Comet (Scan, Exec) (COUNT)                                                        782            800          27         13.4          74.6       2.2X

What changes are included in this PR?

How are these changes tested?

andygrove · 2024-08-06T02:46:27Z

This PR results in an 11% speedup overall for TPC-DS @ 100 GB (based on a single run). Runs do vary, so this may be exaggerating the speedup. I will run with more iterations and post more results.

edit: this was run with comet.debug.enabled by mistake

andygrove · 2024-08-06T04:57:17Z

Average of 3 runs, main branch versus this PR. This shows a 15.5% speedup.

Command used for both runs:

$SPARK_HOME/bin/spark-submit \
    --master $SPARK_MASTER \
    --conf spark.driver.memory=8G \
    --conf spark.executor.instances=1 \
    --conf spark.executor.memory=32G \
    --conf spark.executor.cores=8 \
    --conf spark.cores.max=8 \
    --conf spark.eventLog.enabled=true \
    --jars $COMET_JAR \
    --conf spark.driver.extraClassPath=$COMET_JAR \
    --conf spark.executor.extraClassPath=$COMET_JAR \
    --conf spark.sql.extensions=org.apache.comet.CometSparkSessionExtensions \
    --conf spark.comet.enabled=true \
    --conf spark.comet.exec.enabled=true \
    --conf spark.comet.exec.all.enabled=true \
    --conf spark.comet.cast.allowIncompatible=true \
    --conf spark.comet.shuffle.enforceMode.enabled=true \
    --conf spark.comet.exec.shuffle.enabled=true \
    --conf spark.comet.exec.shuffle.mode=auto \
    --conf spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager \
    tpcbench.py \
    --benchmark tpcds \
    --data /mnt/bigdata/tpcds/sf100/ \
    --queries ../../tpcds/queries-spark \
    --iterations 3

huaxingao

LGTM. Thanks for the PR @andygrove

viirya · 2024-08-06T05:59:06Z

native/core/src/execution/datafusion/planner.rs

-                    .iter()
-                    .map(|child| self.create_expr(child, schema.clone()))
-                    .collect::<Result<Vec<_>, _>>()?;
+                if expr.children.iter().len() == 1 {


Hmm, I think we can also do this for multiple child expressions?

Thanks. I have extended this approach for the multiple argument case.

viirya

Looks okay. Actually it is how Spark count does internally:

 /* count = */ If(nullableChildren.map(IsNull).reduce(Or), count, count + 1L)

* Workaround for COUNT performance * add comments * remove benchmark results * fix regression * revert change to datafusion version * Revert change to Cargo.lock * fix * unify code for single and multiple arguments * clippy

andygrove added 7 commits August 5, 2024 18:50

Workaround for COUNT performance

af7f8ee

add comments

8cfb826

remove benchmark results

bdc1331

fix regression

c584298

Save

c9e0ba4

revert change to datafusion version

76e5615

Revert change to Cargo.lock

0dd0196

andygrove changed the title ~~experiment: workaround for count aggregate performance issue~~ perf: workaround for count aggregate performance issue Aug 6, 2024

andygrove marked this pull request as ready for review August 6, 2024 01:46

fix

db105f3

andygrove requested review from huaxingao and viirya August 6, 2024 04:58

huaxingao approved these changes Aug 6, 2024

View reviewed changes

viirya reviewed Aug 6, 2024

View reviewed changes

viirya approved these changes Aug 6, 2024

View reviewed changes

andygrove added 2 commits August 6, 2024 06:19

unify code for single and multiple arguments

be73a0f

clippy

87e0d59

andygrove changed the title ~~perf: workaround for count aggregate performance issue~~ perf: Improve count aggregate performance Aug 6, 2024

andygrove merged commit d9dc117 into apache:main Aug 6, 2024
76 checks passed

andygrove deleted the count-perf-workaround branch August 6, 2024 13:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Improve count aggregate performance #784

perf: Improve count aggregate performance #784

andygrove commented Aug 6, 2024 •

edited

Loading

andygrove commented Aug 6, 2024 •

edited

Loading

andygrove commented Aug 6, 2024 •

edited

Loading

huaxingao left a comment

viirya Aug 6, 2024

andygrove Aug 6, 2024

viirya left a comment

perf: Improve count aggregate performance #784

perf: Improve count aggregate performance #784

Conversation

andygrove commented Aug 6, 2024 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

andygrove commented Aug 6, 2024 • edited Loading

andygrove commented Aug 6, 2024 • edited Loading

huaxingao left a comment

Choose a reason for hiding this comment

viirya Aug 6, 2024

Choose a reason for hiding this comment

andygrove Aug 6, 2024

Choose a reason for hiding this comment

viirya left a comment

Choose a reason for hiding this comment

andygrove commented Aug 6, 2024 •

edited

Loading

andygrove commented Aug 6, 2024 •

edited

Loading

andygrove commented Aug 6, 2024 •

edited

Loading