Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: added schema evolution to the merge statement #3135

Closed

Conversation

JustinRush80
Copy link

@JustinRush80 JustinRush80 commented Jan 16, 2025

Description

Add schema evolution (only merge) to the MERGE statement. New columns are added based on the columns predicates in the MERGE operations (eg. target.id = source.id). Using when_not_matched_insert_all and when_matched_update_all will add any new column to the target schema

Related Issue(s)

Documentation

@github-actions github-actions bot added binding/python Issues for the Python package binding/rust Issues for the Rust crate labels Jan 16, 2025
Copy link

ACTION NEEDED

delta-rs follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

@JustinRush80 JustinRush80 changed the title feat: Added Schema Evolution to the Merge Statement feat: added schema evolution to the merge statement Jan 16, 2025
Rush and others added 26 commits January 16, 2025 00:23
Signed-off-by: Rush <justin.rush00@delta.com>
Fixes a check so readerFeatures is enabled on version 3 or higher

Signed-off-by: Russell Jancewicz <russell.jancewicz@gmail.com>
Signed-off-by: Rush <justin.rush00@delta.com>
Updates the requirements on [which](https://github.com/harryfei/which-rs) to permit the latest version.
- [Release notes](https://github.com/harryfei/which-rs/releases)
- [Changelog](https://github.com/harryfei/which-rs/blob/master/CHANGELOG.md)
- [Commits](harryfei/which-rs@6.0.0...7.0.0)

---
updated-dependencies:
- dependency-name: which
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Rush <justin.rush00@delta.com>
…y partitions

(cherry picked from commit af17bb2)
Signed-off-by: Alex Wilcoxson <alex.wilcoxson@relativity.com>

chore: fmt
Signed-off-by: Rush <justin.rush00@delta.com>
Updates the requirements on [thiserror](https://github.com/dtolnay/thiserror) to permit the latest version.
- [Release notes](https://github.com/dtolnay/thiserror/releases)
- [Commits](dtolnay/thiserror@1.0.0...1.0.69)

---
updated-dependencies:
- dependency-name: thiserror
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Rush <justin.rush00@delta.com>
(cherry picked from commit 12abf00)
Signed-off-by: Alex Wilcoxson <alex.wilcoxson@relativity.com>
Signed-off-by: Rush <justin.rush00@delta.com>
small correction to z_order columns argument.

Signed-off-by: Rush <justin.rush00@delta.com>
Signed-off-by: Thomas Frederik Hoeck <tfh@norden.com>
Signed-off-by: Rush <justin.rush00@delta.com>
`object_store` invokes `get_credential` on _every_ invocation of a
get/list/put/etc. The provider invocation for environment based
credentials is practically zero-cost, so this has no/low overhead.

In the case of the AssumeRoleProvider or any provider which has _some_
cost, such as an invocation of the AWS STS APIs, this can result in
rate-limiting or service quota exhaustion.

In order to prevent this, the credentials are attempted to be cached
only so long as they have no expired, which is defined in the
`aws_credential_types::Credential` struct

Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
Sponsored-by: Scribd Inc
Signed-off-by: Rush <justin.rush00@delta.com>
This is a fix aimed to enable jsonwriter to checkpoint in accordance
with delta.checkpointInterval.  It changes the default commitbuilder to
set a post_commit_hook so that checkpointing will be done by default.
Potentially we could also expose CommitProperties as an argument to
flush_and_commit, but that would require a change to the function
signature and would be a breaking change.

Signed-off-by: Justin Jossick <jusjosj@cs.washington.edu>
Signed-off-by: Rush <justin.rush00@delta.com>
Signed-off-by: stretchadito <gy.ginayang@gmail.com>
Signed-off-by: Rush <justin.rush00@delta.com>
Signed-off-by: Vikas Sharma <master2vikas@gmail.com>
Signed-off-by: Rush <justin.rush00@delta.com>
Signed-off-by: Rush <justin.rush00@delta.com>
Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
Signed-off-by: Rush <justin.rush00@delta.com>
The release of pyo3 0.22.3 compells this since we cannot otherwise
compile. The choice is between pinning 0.22.2 and upgrading our ABI, and
I think it's better to upgrade the ABI

Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
Signed-off-by: Rush <justin.rush00@delta.com>
see delta-io/delta-kernel-rs#301

Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
Signed-off-by: Rush <justin.rush00@delta.com>
Signed-off-by: Rush <justin.rush00@delta.com>
Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
Signed-off-by: Rush <justin.rush00@delta.com>
Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
Signed-off-by: Rush <justin.rush00@delta.com>
Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
Signed-off-by: Rush <justin.rush00@delta.com>
Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
Signed-off-by: Rush <justin.rush00@delta.com>
Signed-off-by: Rush <justin.rush00@delta.com>
Signed-off-by: Rush <justin.rush00@delta.com>
Signed-off-by: Rush <justin.rush00@delta.com>
Signed-off-by: Stephen Carman <shcarman@gmail.com>
Signed-off-by: Rush <justin.rush00@delta.com>
Today the make_array function from Datafusion uses "item" as the list
element's field name. With recent changes in delta-kernel-rs we have
switched to calling it "element" which is more conventional related to
how Apache Parquet handles things

This change introduces a test which helps isolate the behavior seen in
Python tests within the core crate for easier regression testing

Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
Signed-off-by: Rush <justin.rush00@delta.com>
Abdullahsab3 and others added 21 commits January 16, 2025 00:25
Signed-off-by: Abdullah Sabaa Allil <36844223+Abdullahsab3@users.noreply.github.com>
Signed-off-by: Rush <justin.rush00@delta.com>
The Snapshot.files() functrion is public but cannot be possibly used
because the trait it relies upon isn't public. Oops!

Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
Sponsored-by: Scribd Inc
Signed-off-by: Rush <justin.rush00@delta.com>
Signed-off-by: Rush <justin.rush00@delta.com>
Signed-off-by: Rush <justin.rush00@delta.com>
Signed-off-by: Rush <justin.rush00@delta.com>
Signed-off-by: Rush <justin.rush00@delta.com>
Signed-off-by: Rush <justin.rush00@delta.com>
Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>
Signed-off-by: Rush <justin.rush00@delta.com>
Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>
Signed-off-by: Rush <justin.rush00@delta.com>
Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>
Signed-off-by: Rush <justin.rush00@delta.com>
Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>
Signed-off-by: Rush <justin.rush00@delta.com>
Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>
Signed-off-by: Rush <justin.rush00@delta.com>
Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>
Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>
Signed-off-by: Rush <justin.rush00@delta.com>
Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>
Signed-off-by: Rush <justin.rush00@delta.com>
…te/create

Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>
Signed-off-by: Rush <justin.rush00@delta.com>
Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>
Signed-off-by: Rush <justin.rush00@delta.com>
Signed-off-by: Francisco Garcia Florez <francisco@truevoid.dev>
Signed-off-by: Rush <justin.rush00@delta.com>
Signed-off-by: Max Piskunov <max.piskunov@plus.ai>
Signed-off-by: Rush <justin.rush00@delta.com>
Signed-off-by: Max Piskunov <max.piskunov@plus.ai>
Signed-off-by: Rush <justin.rush00@delta.com>
Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
Signed-off-by: Rush <justin.rush00@delta.com>
@JustinRush80 JustinRush80 force-pushed the feat/merge_schema_upsert branch from cd2e185 to d72d538 Compare January 16, 2025 05:29
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jan 16, 2025
Signed-off-by: Rush <justin.rush00@delta.com>
@github-actions github-actions bot removed the documentation Improvements or additions to documentation label Jan 16, 2025
Copy link

codecov bot commented Jan 16, 2025

Codecov Report

Attention: Patch coverage is 92.62899% with 30 lines in your changes missing coverage. Please review.

Project coverage is 72.27%. Comparing base (af3102e) to head (49480f9).

Files with missing lines Patch % Lines
crates/core/src/operations/merge/mod.rs 94.48% 2 Missing and 20 partials ⚠️
python/src/merge.rs 0.00% 6 Missing ⚠️
python/src/lib.rs 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3135      +/-   ##
==========================================
+ Coverage   72.07%   72.27%   +0.20%     
==========================================
  Files         134      134              
  Lines       43362    43759     +397     
  Branches    43362    43759     +397     
==========================================
+ Hits        31252    31628     +376     
- Misses      10087    10099      +12     
- Partials     2023     2032       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@JustinRush80 JustinRush80 deleted the feat/merge_schema_upsert branch January 16, 2025 06:28
@JustinRush80 JustinRush80 restored the feat/merge_schema_upsert branch January 16, 2025 12:54
@JustinRush80 JustinRush80 deleted the feat/merge_schema_upsert branch January 16, 2025 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.