Skip to content

Commit

Permalink
feat: introduce CDC write-side support for the Update operations
Browse files Browse the repository at this point in the history
This change introduces a `CDCTracker` which helps collect changes during
merges and update. This is admittedly rather inefficient, but my hope is
that this provides a place to start iterating and improving upon the
writer code

There is still additional work which needs to be done to handle table
features properly for other code paths (see the middleware discussion we
have had in Slack) but this produces CDC files for Update operations

Fixes delta-io#604
Fixes delta-io#2095
  • Loading branch information
rtyler authored and ion-elgreco committed Jun 4, 2024
1 parent b64335d commit 3fad049
Show file tree
Hide file tree
Showing 9 changed files with 819 additions and 20 deletions.
3 changes: 2 additions & 1 deletion crates/core/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,8 @@ tokio = { version = "1", features = ["macros", "rt-multi-thread"] }
utime = "0.3"

[features]
default = []
cdf = []
default = ["cdf"]
datafusion = [
"dep:datafusion",
"datafusion-expr",
Expand Down
6 changes: 6 additions & 0 deletions crates/core/src/delta_datafusion/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -509,6 +509,12 @@ impl<'a> DeltaScanBuilder<'a> {
self
}

/// Use the provided [SchemaRef] for the [DeltaScan]
pub fn with_schema(mut self, schema: SchemaRef) -> Self {
self.schema = Some(schema);
self
}

pub async fn build(self) -> DeltaResult<DeltaScan> {
let config = self.config;
let schema = match self.schema {
Expand Down
Loading

0 comments on commit 3fad049

Please sign in to comment.