Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(query): support row fetch for merge into #15859

Merged
merged 25 commits into from
Jun 24, 2024

Conversation

Dousir9
Copy link
Member

@Dousir9 Dousir9 commented Jun 21, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

  1. Support row fetch for merge into.
  2. Refactor merge into physical plan, add MergeIntoSplit and MergeIntoManipulate to better build pipeline.
  3. Fix the bug when dividing tasks in ParquetRowsFetcher.

We use the enable_merge_into_row_fetch setting to control whether row fetch is enabled, and we will adjust it adaptively later.

Performance Test (Databend Cloud Small)

1. TPC-H SF100

Query Duration: 22.6s → 7.1s
Byte Scanned: 107 GB → 8 GB

create table small_lineitem as select * from lineitem limit 500000;
create table lineitem_source as select * from (select *, ROW_NUMBER() OVER (PARTITION by l_orderkey) as rn from small_lineitem) t where t.rn = 1;
-- 124989 rows matched
select count(*) from lineitem_source as source, lineitem as target where source.l_orderkey = target.l_orderkey and source.l_linenumber = target.l_linenumber; 

-- merge into test sql
merge into lineitem as target using lineitem_source as source on target.l_orderkey = source.l_orderkey and source.l_linenumber = target.l_linenumber when matched then update set target.l_quantity = source.l_quantity + 1 when not matched then insert *;
-- main: 22.6s
set global enable_merge_into_row_fetch = 0;
merge into lineitem as target using lineitem_source as source on target.l_orderkey = source.l_orderkey and source.l_linenumber = target.l_linenumber when matched then update set target.l_quantity = source.l_quantity + 1 when not matched then insert *;

-- enable merge into row fetch: 7.1s
set global enable_merge_into_row_fetch = 1;
merge into lineitem as target using lineitem_source as source on target.l_orderkey = source.l_orderkey and source.l_linenumber = target.l_linenumber when matched then update set target.l_quantity = source.l_quantity + 1 when not matched then insert *;
Main PR
main pr

2. Table with 50 columns

Byte Scanned: 186 GB → 7 GB

target: numbers(50000000)
source: numbers(100000)
merge into target using source on target.col_0 = source.col_0  when matched then update set target.col_1 = source.col_1 + 1 when not matched then insert *;

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Jun 21, 2024
@Dousir9 Dousir9 added the ci-cloud Build docker image for cloud test label Jun 23, 2024
Copy link
Contributor

Docker Image for PR

  • tag: pr-15859-de8286d-1719165518

note: this image tag is only available for internal use,
please check the internal doc for more details.

@Dousir9 Dousir9 marked this pull request as ready for review June 24, 2024 07:15
@Dousir9 Dousir9 added this pull request to the merge queue Jun 24, 2024
Merged via the queue into databendlabs:main with commit efe4149 Jun 24, 2024
79 checks passed
@Dousir9 Dousir9 deleted the support_row_fetch_for_merge_into branch June 24, 2024 08:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-cloud Build docker image for cloud test pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants