Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: merge into split #15644

Merged
merged 5 commits into from
May 28, 2024
Merged

Conversation

dantengsky
Copy link
Member

@dantengsky dantengsky commented May 26, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

for merge-into stmt with both matched and not-matched branch, right join is used,
after got the joined result-set, a column of the result-set is chosen as the "split" column,
i.e. for rows of the split column, those are nulls are recognized as no-matched, and others as matched.

before this PR, an arbitrary column of target table is picked as the "split" column, which is unsafe, since
the "arbitrary column" may have NULL values originally, which may lead to incorrectly treat a matched row
as unmatched, and unexpected result (data duplication).

in this PR, we use row_id column to split the result-set into matched and not-matched parts

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-bugfix this PR patches a bug in codebase label May 26, 2024
@dantengsky dantengsky marked this pull request as ready for review May 26, 2024 13:30
@dantengsky dantengsky force-pushed the fix-merge-into-split branch 3 times, most recently from c7dac6c to ac48fc6 Compare May 28, 2024 02:16
for merge-into stmt with both matched and not-matched branch, right join
is used,
after got the joined result-set, a column of the result-set is chosen as
the "split" column,
i.e. for rows of the split column, those are nulls are recognized as
no-matched, and others as matched.

before this PR, an arbitrary column of target table is picked as the
"split" column, which is unsafe, since
the "arbitrary column" may have NULL values originally, which may lead
to incorrectly treat a matched row
as unmatched, and unexpected result (data duplication).
@BohuTANG BohuTANG requested a review from xudong963 May 28, 2024 10:25
@xudong963
Copy link
Member

Error: SelfError("sqllogictest failed")
Completed MySQL test for file: tests/sqllogictests/suites/base/09_fuse_engine/09_0036_merge_into_without_distributed_enable.test ✅ (90.038121213s)
Test finished, fail fast enabled, 1 out of 6480 records failed to run
0: statement failed: mysql client error: Server error: `ERROR HY000 (1105): TableAlreadyLocked. Code: 2015, Text = table is locked by other session, please retry later.'
[SQL] alter table tbl_01_0007 recluster final where a != 4
at tests/sqllogictests/suites/base/01_system/01_0007_system_clustering_history.test:16

@dantengsky dantengsky added this pull request to the merge queue May 28, 2024
@BohuTANG BohuTANG removed this pull request from the merge queue due to a manual request May 28, 2024
@BohuTANG BohuTANG merged commit b3305ae into databendlabs:main May 28, 2024
69 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-bugfix this PR patches a bug in codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: MERGE INTO duplicates rows unexpectedly
3 participants