Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add WHEN NOT MATCHED BY SOURCE support #1511

Closed
wants to merge 1 commit into from

Conversation

johanl-db
Copy link
Collaborator

@johanl-db johanl-db commented Dec 6, 2022

Description

Add WHEN NOT MATCHED BY SOURCE to MergeIntoCommand

This PR adds support for WHEN NOT MATCHED BY SOURCE clauses in merge into command using the Scala/Java Delta table API. Support for WHEN NOT MATCHED BY SOURCE using SQL will be available with Spark 3.4 release and python support will follow up in a different PR.

Changes:

  • Extend Delta Merge API with support for NOT MATCHED BY SOURCE clause.
  • Extend Delta analyzer to support the new type of clause:
    • Resolve target column references in NOT MATCHED BY SOURCE conditions and update actions
    • Handle schema evolution (same as MATCHED clause): generate update expressions to align with the expected target schema
  • Implement support for NOT MATCHED BY SOURCE in MergeIntoCommand.

How was this patch tested?

New test trait MergeIntoNotMatchedBySourceSuite is added and collects all tests covering this feature. It is mixed into the Merge Scala test class to run tests against the Delta API and will be mixed in the Merge base test class to also cover the Spark SQL API once Spark 3.4 is released.

Test coverage:

  • Analysis errors: invalid clauses or conditions, invalid column references.
  • Correctness with various combination of clauses.
  • Schema evolution.

Does this PR introduce any user-facing changes?

This change extends the existing Delta Merge API to allow specifying WHEN NOT MATCHED BY SOURCE clauses and their corresponding optional condition and actions. The new API follows the existing APIs for MATCHED and NOT MATCHED clauses.
Usage - Scala:

targetDeltaTable.merge(sourceTable, "targetKey = sourceKey")
    .whenNotMatchedBySource("targetValue > 0").updateExpr(Map("targetValue" -> "targetValue + 1"))
    .whenNotMatchedBySource().delete()
    .execute();

@@ -1068,6 +1068,12 @@
],
"sqlState" : "42000"
},
"DELTA_NON_LAST_NOT_MATCHED_BY_SOURCE_CLAUSE_OMIT_CONDITION" : {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Not sure if we can change the existing error messages, but this is same as the next one except the clause name which can be converted to a error msg parameter.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These error codes are also defined in Spark: https://github.com/apache/spark/blob/master/core/src/main/resources/error/error-classes.json#L855
I think we would need to update them there first then in delta.

scottsand-db pushed a commit that referenced this pull request Jan 5, 2023
… BY SOURCE clause in merge commands.

Support for the clause was introduced in #1511 using the Scala Delta Table API, this patch extends the Python API to support the new clause.

See corresponding feature request: #1364

Adding python tests covering WHEN NOT MATCHED BY SOURCE to test_deltatable.py.

The extended API for NOT MATCHED BY SOURCE mirrors existing clauses (MATCHED/NOT MATCHED).
Usage:
```
        dt.merge(source, "key = k")
            .whenNotMatchedBySourceDelete(condition="value > 0")
            .whenNotMatchedBySourceUpdate(set={"value": "value + 0"})
            .execute()
```

Closes #1533

GitOrigin-RevId: 76c7aea481fdbbf47af36ef7251ed555749954ac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants