Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge operation that only touches necessary partitions #1991

Closed
halvorlu opened this issue Dec 26, 2023 · 3 comments
Closed

Merge operation that only touches necessary partitions #1991

halvorlu opened this issue Dec 26, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@halvorlu
Copy link

Description

From what I can see, the merge operation touches all partitions, instead of only touching those that are relevant for the data to be merged.
Example:

import pyarrow as pa
from deltalake import write_deltalake
data = pa.table({"x": [1, 2, 3], "y": [4, 5, 6]})
write_deltalake(path, data, partition_by=["x"])
dt = DeltaTable(path)
new_data = pa.table({"x": [1], "y": [7]})
(dt.merge(
        source=new_data,
        predicate="target.x = source.x",
        source_alias="source",
        target_alias="target",
    )
    .when_matched_update_all()
    .execute())

The table is partitioned by x, so when merging in new data where x=1, I would expect only one partition to be touched.
The output from the merge operation is
{'num_source_rows': 1, 'num_target_rows_inserted': 0, 'num_target_rows_updated': 1, 'num_target_rows_deleted': 0, 'num_target_rows_copied': 2, 'num_output_rows': 3, 'num_target_files_added': 3, 'num_target_files_removed': 3, ...}
which indicates that all three partitions are processed.

So my questions are:

  • Is there any way doing a merge that only touches certain partitions?
  • If not: Are there any plans for implementing such a feature?
@halvorlu halvorlu added the enhancement New feature or request label Dec 26, 2023
@ion-elgreco
Copy link
Collaborator

ion-elgreco commented Dec 26, 2023

This is already there but not yet released: #1958

I am waiting on two other features before we push a new python release.

@halvorlu
Copy link
Author

Thanks for the update! When do you think the next python release will be?

@ion-elgreco
Copy link
Collaborator

Thanks for the update! When do you think the next python release will be?

In 1 or 2 weeks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants