Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(query): turn on new agg hashtable #15155

Merged
merged 4 commits into from
Apr 3, 2024

Conversation

Freejww
Copy link
Contributor

@Freejww Freejww commented Apr 2, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

In the previous PR, we implemented a new aggregation hash table. Now, it supports both singleton and cluster environments, and we have also added support for spill. This PR includes performance tests and try to enable the new aggregation hash table.

  • Fixes #[Link the issue here]

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

@Freejww Freejww added the ci-cloud Build docker image for cloud test label Apr 2, 2024
@github-actions github-actions bot added the pr-chore this PR only has small changes that no need to record, like coding styles. label Apr 2, 2024
Copy link
Contributor

github-actions bot commented Apr 2, 2024

Docker Image for PR

  • tag: pr-15155-a3d050a

note: this image tag is only available for internal use,
please check the internal doc for more details.

@Freejww
Copy link
Contributor Author

Freejww commented Apr 2, 2024

Performance test on ClickBench/hits

Note: We use hits as a benchmark because there are many group by tests.

Local machine configuration: 32GB memory, 12-core CPU.

  1. The new aggregation hash table has completely outperformed the old method.
  2. In some high-cardinality or string grouping , the new method outperforms the old one.
  3. More details will be introduced in a later blog.
Query Local singleton main pr improve Local cluster two nodes main pr improve Local cluster three nodes main pr improve Cloud small main pr improve Cloud medium main pr improve
Q1 0.006s 0.006s 0% 0.008s 0.008s 0% 0.012s 0.012s 0% 0.033s 0.032s 3% 0.025s 0.024s 4%
Q2 0.059s 0.059s 0% 0.079s 0.078s 1% 0.084s 0.083s 1% 0.168s 0.106s 37% 0.109s 0.079s 28%
Q3 0.17s 0.161s 5% 0.175s 0.182s -4% 0.189s 0.186s 2% 0.139s 0.117s 16% 0.172s 0.115s 33%
Q4 0.191s 0.185s 3% 0.199s 0.208s -5% 0.21s 0.206s 2% 0.338s 0.283s 16% 0.345s 0.318s 8%
Q5 0.544s 0.542s 0% 0.885s 0.811s 8% 1.075s 1.049s 2% 0.423s 0.419s 1% 0.571s 0.513s 10%
Q6 0.932s 0.858s 8% 1.381s 1.3s 6% 1.714s 1.303s 24% 0.666s 0.592s 11% 0.75s 0.631s 16%
Q7 0.049s 0.049s 0% 0.07s 0.07s 0% 0.067s 0.067s 0% 0.092s 0.092s 0% 0.099s 0.099s 0%
Q8 0.061s 0.061s 0% 0.081s 0.088s -9% 0.097s 0.102s -5% 0.089s 0.082s 8% 0.111s 0.086s 23%
Q9 0.862s 0.872s -1% 1.11s 1.102s 1% 1.336s 1.207s 10% 0.604s 0.673s -11% 0.782s 0.707s 10%
Q10 1.023s 1.044s -1% 1.323s 1.274s 4% 1.618s 1.376s 15% 0.73s 0.802s -10% 0.938s 0.717s 24%
Q11 0.415s 0.413s 0% 0.488s 0.48s 2% 0.557s 0.504s 10% 0.424s 0.398s 6% 0.472s 0.435s 8%
Q12 0.465s 0.461s 1% 0.54s 0.528s 2% 0.613s 0.572s 7% 0.402s 0.38s 5% 0.441s 0.405s 8%
Q13 1.081s 0.881s 19% 1.652s 1.416s 14% 1.995s 1.417s 29% 0.722s 0.533s 26% 0.838s 0.686s 18%
Q14 1.837s 1.424s 22% 2.514s 2.041s 19% 2.909s 2.122s 27% 1.158s 0.864s 25% 1.022s 0.934s 9%
Q15 1.279s 0.983s 23% 1.876s 1.536s 18% 2.27s 1.563s 31% 0.817s 0.603s 26% 0.885s 0.711s 20%
Q16 0.927s 0.696s 25% 1.501s 1.162s 23% 1.737s 1.393s 20% 0.563s 0.461s 18% 0.752s 0.699s 7%
Q17 3.03s 1.62s 47% 4.154s 2.631s 37% 4.361s 2.934s 33% 1.714s 1.049s 39% 1.435s 1.047s 27%
Q18 1.663s 0.969s 42% 1.757s 1.2s 32% 1.6s 1.313s 18% 1.084s 0.777s 28% 0.542s 0.519s 4%
Q19 6.223s 2.737s 56% 8.269s 4.699s 43% 9.316s 5.06s 46% 3.326s 1.872s 44% 2.515s 1.642s 35%
Q20 0.007s 0.007s 0% 0.04s 0.04s 0% 0.016s 0.016s 0% 0.047s 0.047s 0% 0.06s 0.06s 0%
Q21 2.497s 2.444s 2% 2.614s 2.566s 2% 2.682s 2.669s 0% 1.515s 1.512s 0% 1.109s 0.813s 27%
Q22 2.819s 2.795s 1% 2.958s 2.941s 1% 3.082s 3.082s 0% 1.821s 1.82s 0% 1.597s 0.998s 38%
Q23 5.848s 5.808s 1% 6.087s 6.007s 1% 6.488s 6.269s 3% 4.036s 3.95s 2% 2.915s 2.161s 26%
Q24 3.412s 3.398 1% 3.223s 3.204s 1% 3.441s 3.392s 1% 2.551s 2.566s -1% 2.693s 2.524s 6%
Q25 0.718s 0.721s -4% 0.738s 0.734s 1% 0.783s 0.783s 0% 0.755s 0.726s 4% 0.79s 0.447s 43%
Q26 0.53s 0.536s -1% 0.515s 0.511s 1% 0.564s 0.547s 3% 0.388s 0.38s 2% 0.388s 0.336s 13%
Q27 0.769s 0.774s -1% 0.775s 0.768s 1% 0.825s 0.804s 3% 0.748s 0.743s 1% 0.8s 0.462s 42%
Q28 3.177s 3.15s 1% 3.187s 3.186s 0% 3.263s 3.256s 0% 1.623s 1.652s -2% 1.18s3 0.949s 20%
Q29 4.359s 4.394s -1% 4.730s 4.701s 1% 5.426s 4.714s 13% 3.074s 3.033s 1% 2.02s6 1.833s 10%
Q30 0.137s 0.137s 0% 0.14s 0.14s 0% 0.147s 0.147s 0% 0.169s 0.169s 0% 0.168s 0.168s 0%
Q31 1.097s 0.993s 9% 1.519s 1.413s 7% 1.823s 1.386s 24% 0.843s 0.74s 12% 0.882s 0.692s 22%
Q32 1.72s 1.296s 25% 2.259s 1.946s 14% 2.586s 1.897s 27% 1.331s 1.099s 17% 1.349s 1.181s 12%
Q33 9.633s 3.772s 61% 13.7s 7.863s 43% 19.029s 8.305s 56% 4.258s 2.322s 45% 3.62s 2.352s 35%
Q34 5.485s 5.125s 7% 8.278s 7.39s 11% 8.982s 8.419s 6% 2.96s 2.412s 19% 2.936s 2.698s 8%
Q35 5.45s 5.106s 6% 8.361s 7.348s 12% 8.968s 8.478s 5% 2.963s 2.483s 16% 2.996s 2.682s 10%
Q36 0.738s 0.513s 30% 1.104s 0.908s 18% 1.406s 1.07s 24% 0.491s 0.41s 16% 0.721s 0.704s 2%
Q37 0.147s 0.138s 6% 0.244s 0.17s 30% 0.313s 0.192s 39% 0.167s 0.115s 31% 0.242s 0.184s 24%
Q38 0.136s 0.136s 0% 0.159s 0.159s 0% 0.167s 0.181s -8% 0.124s 0.121s 2% 0.131s 0.131s 0%
Q39 0.103s 0.107s -4% 0.12s 0.121s -1% 0.124s 0.126s -1% 0.105s 0.104s 1% 0.118s 0.117s 1%
Q40 0.281s 0.254s 10% 0.432s 0.33s 24% 0.493s 0.33s 33% 0.236s 0.203s 14% 0.326s 0.224s 31%
Q41 0.04s 0.039s 3% 0.08s 0.062s 23% 0.068s 0.066s 3% 0.086s 0.083s 3% 0.097s 0.075s 23%
Q42 0.045s 0.037s 18% 0.058s 0.059s -2% 0.063s 0.064s -1% 0.08s 0.073s 9% 0.083s 0.069s 17%
Q43 0.032s 0.032s 0% 0.051s 0.054s -6% 0.06s 0.052s 17% 0.075s 0.072s 4% 0.079s 0.068s 14%

@BohuTANG BohuTANG added the ci-benchmark Benchmark: run all test label Apr 2, 2024
Copy link
Contributor

github-actions bot commented Apr 2, 2024

Docker Image for PR

  • tag: pr-15155-845b9c7

note: this image tag is only available for internal use,
please check the internal doc for more details.

@sundy-li
Copy link
Member

sundy-li commented Apr 2, 2024

There are explain tests that need to be fixed.

@sundy-li
Copy link
Member

sundy-li commented Apr 2, 2024

We use random and fixed SQL queries to cover the correctness tests.

Test Codes in https://github.com/sundy-li/mmbend/tree/master/examples

@BohuTANG
Copy link
Member

BohuTANG commented Apr 2, 2024

Great 👍

Wizard [SELECTS] tests passed, with results compared to Snowflake. SQL selects script here: https://github.com/datafuselabs/wizard/blob/main/checksb/sql/selects/check.sql

How to run the tests:

python3 checksb.py --database selects --warehouse COMPUTE_WH --case selects

@sundy-li sundy-li removed ci-benchmark Benchmark: run all test ci-cloud Build docker image for cloud test labels Apr 2, 2024
@Freejww Freejww requested a review from sundy-li April 3, 2024 00:59
@sundy-li sundy-li added this pull request to the merge queue Apr 3, 2024
@BohuTANG BohuTANG removed this pull request from the merge queue due to a manual request Apr 3, 2024
@BohuTANG BohuTANG merged commit 7e9b835 into databendlabs:main Apr 3, 2024
72 checks passed
@compasses
Copy link

May I know the test carried on memory limited ?

Performance test on ClickBench/hits

@sundy-li
Copy link
Member

sundy-li commented Apr 15, 2024

May I know the test carried on memory limited ?

No major difference if blocks are spilled, because the bottleneck is io issue.

@compasses
Copy link

got thanks
So I think this kind of implementation may not efficient than general-purpose hash table like Clickhouse did.

@sundy-li
Copy link
Member

So I think this kind of implementation may not efficient than general-purpose hash table like Clickhouse did.

I disagree with that. The older one is the general-purpose hashtable you indicate.

@compasses
Copy link

Oh, OK, I mean without IO effect. Since most of agg workloads are memory-bound. And the way of implementation above seems need indirection access? Or maybe I missed something, but memory bandwidth is important for high cardinality keys.

@Freejww
Copy link
Contributor Author

Freejww commented Apr 16, 2024

Oh, OK, I mean without IO effect. Since most of agg workloads are memory-bound. And the way of implementation above seems need indirection access? Or maybe I missed something, but memory bandwidth is important for high cardinality keys.

Indirect access is that we only compare the high 16 bits of the hash. If they match, then we proceed to compare the actual data. General-purpose hashtable compares the entire hash value before proceeding to compare the actual data.

@compasses
Copy link

OK got you thanks. Since the general-purpose hashtable will compare hash values if they saved the hash, if not just use it for probing, but any way the memory layout is more cache-friendly.

yufan022 pushed a commit to yufan022/databend that referenced this pull request Jun 18, 2024
yufan022 pushed a commit to yufan022/databend that referenced this pull request Jun 18, 2024
* fix hits-q18-perf

* turn on new agg hashtable

* rewrite sqllogical test

* fix sqllogical test

---------

Co-authored-by: jw <freejw@gmail.com>

(cherry picked from commit 7e9b835)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-chore this PR only has small changes that no need to record, like coding styles.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants