Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add config for parquet pushdown on delta scan #2364

Merged
merged 3 commits into from
Mar 31, 2024

Conversation

Blajda
Copy link
Collaborator

@Blajda Blajda commented Mar 31, 2024

Description

Delta scan will push filter to the parquet scan when possible. Added a new configuration for the special case where operations need to operate on an entire file but still want to perform pruning.

Related Issue(s)

@github-actions github-actions bot added the binding/rust Issues for the Rust crate label Mar 31, 2024
@Blajda
Copy link
Collaborator Author

Blajda commented Mar 31, 2024

Output result when disabling parquet pushdown

{'num_source_rows': 11809733, 'num_target_rows_inserted': 11809733, 'num_target_rows_updated': 0, 'num_target_rows_deleted': 0, 'num_target_rows_copied': 0, 'num_output_rows': 11809733, 'num_target_files_added': 10, 'num_target_files_removed': 0, 'execution_time_ms': 82372, 'scan_time_ms': 0, 'rewrite_time_ms': 82328}
init df shape: 11809733
delta shape after merge: (11809733, 1)
----
{'num_source_rows': 42208, 'num_target_rows_inserted': 28296, 'num_target_rows_updated': 13912, 'num_target_rows_deleted': 0, 'num_target_rows_copied': 11795821, 'num_output_rows': 11838029, 'num_target_files_added': 16, 'num_target_files_removed': 10, 'execution_time_ms': 41231, 'scan_time_ms': 0, 'rewrite_time_ms': 41185}
init df shape: 42208
delta shape after merge: (11838029, 1)
----
{'num_source_rows': 421813, 'num_target_rows_inserted': 421813, 'num_target_rows_updated': 0, 'num_target_rows_deleted': 0, 'num_target_rows_copied': 0, 'num_output_rows': 421813, 'num_target_files_added': 1, 'num_target_files_removed': 0, 'execution_time_ms': 4241, 'scan_time_ms': 0, 'rewrite_time_ms': 4187}
init df shape: 421813
delta shape after merge: (12259842, 1)
----
{'num_source_rows': 3881854, 'num_target_rows_inserted': 3032366, 'num_target_rows_updated': 849488, 'num_target_rows_deleted': 0, 'num_target_rows_copied': 11382058, 'num_output_rows': 15263912, 'num_target_files_added': 20, 'num_target_files_removed': 16, 'execution_time_ms': 66108, 'scan_time_ms': 0, 'rewrite_time_ms': 66061}
init df shape: 3881854
delta shape after merge: (15292208, 1)
----

@Blajda Blajda marked this pull request as ready for review March 31, 2024 21:35
@ion-elgreco
Copy link
Collaborator

Perhaps it is good to add the code from the issue as a test case in Python or rust

@Blajda
Copy link
Collaborator Author

Blajda commented Mar 31, 2024

Added a new test that fails when you mutate parquet pushdown to true.

@Blajda Blajda merged commit 7568b57 into delta-io:main Mar 31, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Merge update+insert truncates a delta table if the table is big enough
2 participants