Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(planner): unify execution of DML statements (MERGE, UPDATE, DELETE) #16060

Merged
merged 115 commits into from
Aug 2, 2024

Conversation

Dousir9
Copy link
Member

@Dousir9 Dousir9 commented Jul 16, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

  1. Unify execution of DML statements (MERGE, UPDATE, DELETE), the unified execution improves the performance of UPDATE and DELETE and avoids OOM caused by row id.
  2. Support UDF expression for update set expression, there will be another PR to support UDF for other parts of DML later.

Performance Test (Databend Cloud) (TPC-H SF10)

-- Update, predicate with subquery
update lineitem_60000000 set l_quantity = l_quantity + 1 where l_linenumber in (select l_linenumber from lineitem_100000);

-- Delete, predicate with subquery
delete from lineitem_60000000 where l_linenumber in (select l_linenumber from lineitem_100000);

Databend Cloud (Small)

Update Duration: 94.6s → 29.4s (321%)
Delete Duration: 80.7s → 16.3s (495%)

Databend Cloud (Medium)

Update Duration: 70.2s → 24.6s (285%)
Delete Duration: 64.8s → 8.5s (762%)

Full Test Script

-- Small
-- 1. Prepare data
-- 1.1 TPC-H SF10
create or replace table lineitem_60000000 as select * from lineitem limit 60000000;
-- 1.2 Small table with 10w rows
create or replace table lineitem_100000 as select * from lineitem_60000000 limit 100000;

-- 2. Update test
-- 2.1 Sample predicate
-- main: 13s
-- pr:   13s
update lineitem_60000000 set l_quantity = l_quantity + 1 where l_linenumber >= 1 and l_linenumber < 7;

-- 2.2 Predicate with subquery
-- main: 94.6s
-- pr:   29.4s
update lineitem_60000000 set l_quantity = l_quantity + 1 where l_linenumber in (select l_linenumber from lineitem_100000);

-- 3. Delete test
-- 3.1 Predicate with subquery
-- main: 80.7s
-- pr:   16.3s
delete from lineitem_60000000 where l_linenumber in (select l_linenumber from lineitem_100000);


-- Medium
-- 1. Prepare data
-- 1.1 TPC-H SF10
create or replace table lineitem_60000000 as select * from lineitem limit 60000000;
-- 1.2 Small table with 10 w rows
create or replace table lineitem_100000 as select * from lineitem_60000000 limit 100000;

-- 2. Update test
-- 2.1 Sample predicate
-- main: 8s
-- pr:   8s
update lineitem_60000000 set l_quantity = l_quantity + 1 where l_linenumber >= 1 and l_linenumber < 7;

-- 2.2 Predicate with subquery
-- main: 70.2s
-- pr:   24.6s
update lineitem_60000000 set l_quantity = l_quantity + 1 where l_linenumber in (select l_linenumber from lineitem_100000);

-- 3. Delete test
-- 3.1 Predicate with subquery
-- main: 64.8s
-- pr:   8.5s
delete from lineitem_60000000 where l_linenumber in (select l_linenumber from lineitem_100000);

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Jul 16, 2024
@BohuTANG
Copy link
Member

Will this refactor resolve the issue #15806?

@Dousir9 Dousir9 marked this pull request as ready for review July 31, 2024 05:24
Copy link
Member

@xudong963 xudong963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall migration framework and idea are as expected, most of the LGTM, but delete, update, insert execution details of the migration need to be further reviewed by the storage team. Good job!

@SkyFan2002
Copy link
Member

LGTM. Thanks!

Copy link
Member

@zhyass zhyass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BohuTANG BohuTANG merged commit ed54911 into databendlabs:main Aug 2, 2024
72 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-cloud Build docker image for cloud test pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: update supports external udf
6 participants