Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic type widening in MERGE #2764

Merged
merged 7 commits into from
Mar 22, 2024

Conversation

johanl-db
Copy link
Collaborator

@johanl-db johanl-db commented Mar 19, 2024

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

This change is part of the type widening table feature.
Type widening feature request: #2622
Type Widening protocol RFC: #2624

It adds automatic type widening as part of schema evolution in MERGE INTO:

  • During resolution of the DeltaMergeInto plan, when merging the target and source schema to compute the schema after evolution, we keep the wider source type when type widening is enabled on the table.
  • When updating the table schema at the beginning of MERGE execution, metadata is added to the schema to record type changes.

How was this patch tested?

  • A new test suite DeltaTypeWideningSchemaEvolutionSuite is added to cover type evolution in MERGE

This PR introduces the following user-facing changes

The table feature is available in testing only, there are no user-facing changes as of now.

When automatic schema evolution is enabled in MERGE and the source schema contains a type that is wider than the target schema:

With type widening disabled: the type in the target schema is not changed. the ingestion behavior follows the storeAssignmentPolicy configuration:

  • LEGACY: source values that overflow the target type are stored as null
  • ANSI: a runtime check is injected to fail on source values that overflow the target type.
  • STRICT: the MERGE operation fails during analysis.

With type widening enabled: the type in the target schema is updated to the wider source type. The MERGE operation always succeeds:

-- target: key int, value short
-- source: key int, value int
MERGE INTO target
USING source
ON target.key = source.key
WHEN MATCHED THEN UPDATE SET *

After the MERGE operation, the target schema is key int, value int.

Copy link
Contributor

@sabir-akhadov sabir-akhadov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some comments.

Copy link
Contributor

@sabir-akhadov sabir-akhadov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Collaborator

@tomvanbussel tomvanbussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This is going to be really powerful when combined with the already existing schema auto migration! I have two questions and a small nit, but otherwise it LGTM.

@@ -278,14 +278,22 @@ object ResolveDeltaMergeInto {
})

val migrationSchema = filterSchema(source.schema, Seq.empty)
val allowTypeWidening = target.exists {
case DeltaTable(fileIndex) =>
TypeWidening.isEnabled(fileIndex.protocol, fileIndex.metadata)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if type widening is disabled after this statement, but before the main transaction starts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two calls to SchemaMergingUtils.mergeSchemas (here and in ImplicitMetadataOperation) will return a different schema: first one with the wider type, second one with the original type.

This works currently because the second call to mergeSchemas doesn't allow implicit casts and will fail but it's quite brittle. I added a check at the start of MERGE to properly fail when type widening is enabled/disabled concurrently

@johanl-db johanl-db mentioned this pull request Mar 22, 2024
5 tasks
Copy link
Contributor

@tdas tdas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome.

@tdas tdas merged commit 90b98e3 into delta-io:master Mar 22, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants