-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce DomainMetadata action to delta spec #1742
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea -- elegant generalization of the transaction identifier concept.
Tho I do wonder how the two overlap (see comment below).
c30d3da
to
3bfaa23
Compare
PROTOCOL.md
Outdated
|
||
There are two types of metadata domains: | ||
1. **User-controlled metadata domains** have names that start with anything other than the `delta.` prefix. Any Delta client implementation or user application can modify these metadata domains, and can allow users to modify them arbitrarily. Delta clients and user applications are encouraged to use a naming convention designed to avoid conflicts with other clients' or users' metadata domains (e.g. `com.databricks.*` or `org.apache.*`). | ||
2. **System-controlled metadata domains** have names that start with the `delta.` prefix. Only Delta client implementations are allowed to modify the metadata for system-controlled domains. A Delta client implementation should only update metadata for system-controlled domains that it knows about and understands. System-controlled metadata domains are used by various table features and each table feature may impose additional semantics on the metadata domains it uses. `delta.` prefix is reserved for metadata domains mentioned in the Delta spec (e.g. as part of some table feature). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2. **System-controlled metadata domains** have names that start with the `delta.` prefix. Only Delta client implementations are allowed to modify the metadata for system-controlled domains. A Delta client implementation should only update metadata for system-controlled domains that it knows about and understands. System-controlled metadata domains are used by various table features and each table feature may impose additional semantics on the metadata domains it uses. `delta.` prefix is reserved for metadata domains mentioned in the Delta spec (e.g. as part of some table feature). | |
2. **System-controlled metadata domains** have names that start with the `delta.` prefix. This prefix is reserved for metadata domains mentioned in the Delta spec (i.e. as part of some table feature), and Delta client implementations must not allow users to modify the metadata for system-controlled domains. A Delta client implementation should only update metadata for system-controlled domains that it knows about and understands. System-controlled metadata domains are used by various table features and each table feature may impose additional semantics on the metadata domains it uses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
PROTOCOL.md
Outdated
|
||
There are two types of metadata domains: | ||
1. **User-controlled metadata domains** have names that start with anything other than the `delta.` prefix. Any Delta client implementation or user application can modify these metadata domains, and can allow users to modify them arbitrarily. Delta clients and user applications are encouraged to use a naming convention designed to avoid conflicts with other clients' or users' metadata domains (e.g. `com.databricks.*` or `org.apache.*`). | ||
2. **System-controlled metadata domains** have names that start with the `delta.` prefix. Only Delta client implementations are allowed to modify the metadata for system-controlled domains. A Delta client implementation should only update metadata for system-controlled domains that it knows about and understands. System-controlled metadata domains are used by various table features and each table feature may impose additional semantics on the metadata domains it uses. `delta.` prefix is reserved for metadata domains mentioned in the Delta spec (e.g. as part of some table feature). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A Delta client implementation should only update metadata for system-controlled domains that it knows about and understands
How should clients handle a system metadata domain they don't understand? Are system metadata domains always associated with some table feature that would block writing and/or reading the table? Or should the client basically just treat it like a user metadata domain at that point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a system-controlled domain is always associated with a table feature, there must be provided domain handlers for it. The sentence
A Delta client implementation should only update metadata for system-controlled domains that it knows about and understands
says the handler should not manupilate other domains and do any malicious things about it, but they can assume there must exist handlers for those un-recognized domains. With that said should be treated the same as user-controlled domains? wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could imagine system-controlled domains for which the default domain handling suffices (writers don't erase or modify its content), but which are documented in the Delta spec and thus can be manipulated by a client who actually understands them? This would cover any kind of auxiliary/optional information that isn't important enough to force the compat requirements that come with a table feature, but where it would still be nice if other readers didn't actively destroy the domain's info? Maybe something like query optimizer stats that don't need to be perfectly up to date, so it's better to let an older/unaware client to propagate the existing (now stale) entry, rather than block that client or delete the stats?
GitOrigin-RevId: a50b8aee9b3d0a016b5f173915975702e4b7dcf1
PROTOCOL.md
Outdated
- A feature name `domainMetadata` must exist in the table's `writerFeatures`. | ||
|
||
#### Reader Requirements for Domain Metadata | ||
The reader should only read those domains they understand and ignore those it doesn't recognize. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think readers need to preserve all domains even if they don't understand them. We wouldn't expect the reader to understand specific user-controlled domains in the first place (user/app controls those), and system-controlled domains that need special reader attention are required to be part of some reader-writer table feature that can specify the desired behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that makes sense. I reworded here based on the suggestion.
PROTOCOL.md
Outdated
The reader should only read those domains they understand and ignore those it doesn't recognize. | ||
|
||
#### Write Requirements for Domain Metadata | ||
The writer should propagate those domains it doesn't recognize. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think writes would be similar to reads (see comment above):
- Any system-controlled domain that needs special attention from a writer should be part of a table feature that can specify the desired behavior
- Writers must not allow users to modify or delete system-controlled domains.
- Writers must only modify or delete system-controlled domains they understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that makes sense. I reworded here based on the suggestion.
PROTOCOL.md
Outdated
|
||
There are two types of metadata domains: | ||
1. **User-controlled metadata domains** have names that start with anything other than the `delta.` prefix. Any Delta client implementation or user application can modify these metadata domains, and can allow users to modify them arbitrarily. Delta clients and user applications are encouraged to use a naming convention designed to avoid conflicts with other clients' or users' metadata domains (e.g. `com.databricks.*` or `org.apache.*`). | ||
2. **System-controlled metadata domains** have names that start with the `delta.` prefix. Only Delta client implementations are allowed to modify the metadata for system-controlled domains. A Delta client implementation should only update metadata for system-controlled domains that it knows about and understands. System-controlled metadata domains are used by various table features and each table feature may impose additional semantics on the metadata domains it uses. `delta.` prefix is reserved for metadata domains mentioned in the Delta spec (e.g. as part of some table feature). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could imagine system-controlled domains for which the default domain handling suffices (writers don't erase or modify its content), but which are documented in the Delta spec and thus can be manipulated by a client who actually understands them? This would cover any kind of auxiliary/optional information that isn't important enough to force the compat requirements that come with a table feature, but where it would still be nice if other readers didn't actively destroy the domain's info? Maybe something like query optimizer stats that don't need to be perfectly up to date, so it's better to let an older/unaware client to propagate the existing (now stale) entry, rather than block that client or delete the stats?
PROTOCOL.md
Outdated
|
||
There are two types of metadata domains: | ||
1. **User-controlled metadata domains** have names that start with anything other than the `delta.` prefix. Any Delta client implementation or user application can modify these metadata domains, and can allow users to modify them arbitrarily. Delta clients and user applications are encouraged to use a naming convention designed to avoid conflicts with other clients' or users' metadata domains (e.g. `com.databricks.*` or `org.apache.*`). | ||
2. **System-controlled metadata domains** have names that start with the `delta.` prefix. This prefix is reserved for metadata domains mentioned in the Delta spec (i.e. as part of some table feature), and Delta client implementations must not allow users to modify the metadata for system-controlled domains. A Delta client implementation should only update metadata for system-controlled domains that it knows about and understands. System-controlled metadata domains are used by various table features and each table feature may impose additional semantics on the metadata domains it uses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ryan-johnson-databricks , somehow I lost your previous comment in my git comment history. Let me try to understand what you mean here.
I could imagine system-controlled domains for which the default domain handling suffices (writers don't erase or modify its content), but which are documented in the Delta spec and thus can be manipulated by a client who actually understands them?
The current spec doesn't mention how the users who understand a domain manipulate the domain metadata and we should say something here. Is that right understanding?
I don't have an answer here honestly since it is depending on the table feature who used this domain. I am thinking any future table feature who utilizes the domain metadata should specify it in their own delta spec section. But any suggestions here is appreciated.
This would cover any kind of auxiliary/optional information that isn't important enough to force the compat requirements that come with a table feature, but where it would still be nice if other readers didn't actively destroy the domain's info? Maybe something like query optimizer stats that don't need to be perfectly up to date, so it's better to let an older/unaware client to propagate the existing (now stale) entry, rather than block that client or delete the stats?
sorry, I don't follow here. Is this something related to how to handle two concurrent transitions that commit the same domain metadata?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct that a table feature should specify how to interpret, modify, and resolve conflicts for a given system-controlled metadata domain. I was referring to the possibility of two sub-flavors:
- A writer table feature might include some metadata domains that aware readers could leverage in some way to their advantage, but which unaware readers can safely ignore without causing correctness issues.
- A "ghostly" table feature which both readers and writers can safely ignore if they don't know about it. Such features should still be documented in the Delta spec, but not as table features that might cause compat concerns -- clients can choose whether to implement support or not.
The hypothetical optimizer stats example I gave above would be an example of a "ghost" feature -- if a reader ignores such stats, they might produce an inferior query plan but otherwise no harm. If a writer ignores such stats (as in, merely propagates them blindly per the metadata domain spec, but does not try to update them in response to changing data), then the stats would slowly go stale and become less useful over time. But that's actually not a problem -- most database engines I know of anyway don't try to recompute such stats at every update, because it's too expensive and the stats normally evolve quite slowly.
Is this something related to how to handle two concurrent transitions that commit the same domain metadata?
An unaware writer should never try to update an unrecognized metadata domain, so it should never cause a transaction conflict -- "blindly propagating" the metadata domain means the commit should not mention it (neither as added nor as removed), and thus satisfies the general conflict resolution rules for metadata domains which this PR already includes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. thanks for the clear explanation! The query optimizer example helper a lot!
Do you think should I add sub-bullets for System-controlled metadata domains
to clarify those two sub-flavors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's enough for the spec to not require that every system-controlled metadata domain must be part of a table feature (it just needs to be documented in the Delta spec). That should suffice, when combined with the proposed wording on reader/writer requirements: If a system-controlled metadata domain needs special handling, that's when it becomes a breaking change and needs to be part of a table feature (just like any other breaking change)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, I removed the (i.e. as part of some table feature)
in this sentence without requiring that it must be in a table feature. Thanks for the feedback!
PROTOCOL.md
Outdated
|
||
There are two types of metadata domains: | ||
1. **User-controlled metadata domains** have names that start with anything other than the `delta.` prefix. Any Delta client implementation or user application can modify these metadata domains, and can allow users to modify them arbitrarily. Delta clients and user applications are encouraged to use a naming convention designed to avoid conflicts with other clients' or users' metadata domains (e.g. `com.databricks.*` or `org.apache.*`). | ||
2. **System-controlled metadata domains** have names that start with the `delta.` prefix. This prefix is reserved for metadata domains mentioned in the Delta spec (i.e. as part of some table feature), and Delta client implementations must not allow users to modify the metadata for system-controlled domains. A Delta client implementation should only update metadata for system-controlled domains that it knows about and understands. System-controlled metadata domains are used by various table features and each table feature may impose additional semantics on the metadata domains it uses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct that a table feature should specify how to interpret, modify, and resolve conflicts for a given system-controlled metadata domain. I was referring to the possibility of two sub-flavors:
- A writer table feature might include some metadata domains that aware readers could leverage in some way to their advantage, but which unaware readers can safely ignore without causing correctness issues.
- A "ghostly" table feature which both readers and writers can safely ignore if they don't know about it. Such features should still be documented in the Delta spec, but not as table features that might cause compat concerns -- clients can choose whether to implement support or not.
The hypothetical optimizer stats example I gave above would be an example of a "ghost" feature -- if a reader ignores such stats, they might produce an inferior query plan but otherwise no harm. If a writer ignores such stats (as in, merely propagates them blindly per the metadata domain spec, but does not try to update them in response to changing data), then the stats would slowly go stale and become less useful over time. But that's actually not a problem -- most database engines I know of anyway don't try to recompute such stats at every update, because it's too expensive and the stats normally evolve quite slowly.
Is this something related to how to handle two concurrent transitions that commit the same domain metadata?
An unaware writer should never try to update an unrecognized metadata domain, so it should never cause a transaction conflict -- "blindly propagating" the metadata domain means the commit should not mention it (neither as added nor as removed), and thus satisfies the general conflict resolution rules for metadata domains which this PR already includes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very useful!
Makes changes to support Spark 3.4. These include compile necessary changes, and test _and_ code changes due to changes in Spark behavior. Some of the bigger changes include - A lot of changes regarding error classes. These include... - Spark 3.4 changed `class ErrorInfo` to private. This means the current approach in `DeltaThrowableHelper` can no longer work. We now use `ErrorClassJsonReader` (these are the changes to `DeltaThrowableHelper` and `DeltaThrowableSuite` - Many error functions switched the first argument from `message: String` to `errorClass: String` which **does not** cause a compile error, but instead causes a "SparkException-error not found" when called. Some things affected include `ParseException(...)`, `a.failAnalysis(..)`. - Supports error subclasses - Spark 3.4 supports insert-into-by-name and no longer reorders such queries to be insert-into-by-ordinal. See apache/spark#39334. In `DeltaAnalysis.scala` we need to perform schema validation checks and schema evolution for such queries; right now we only match when `!isByName` - SPARK-27561 added support for lateral column alias. This broke our generation expression validation checks for generated columns. We now separately check for generated columns that reference other generated columns in `GeneratedColumn.scala` - `DelegatingCatalogExtension` deprecates `createTable(..., schema: StructType, ...)` in favor of `createTable(..., columns: Array[Column], ...)` - `_metadata.file_path` is not always encoded. We update `DeleteWithDeletionVectorsHelper.scala` to accomodate for this. - Support for SQL `REPLACE WHERE`. Tests are added to `DeltaSuite`. - Misc test changes due to minor changes in Spark behavior or error messages Resolves delta-io#1696 Existing tests should suffice since there are no major Delta behavior changes _besides_ support for `REPLACE WHERE` for which we have added tests. Yes. Spark 3.4 will be supported. `REPLACE WHERE` is supported in SQL. GitOrigin-RevId: b282c95c4e6a7a1915c2a4ae9841b5e43ed4724d Fix a test in DeltaVacuumSuite to pass locally "vacuum after purging deletion vectors" in `DeltaVacuumSuite` fails locally because the local filesystem only writes modification times to second accuracy. This means a transaction might have timestamp `1683694325000` but the tombstone for a file removed in that transaction could have deletionTimestamp `1683694325372`. ---> The test fails since we set the clock to the transaction timestamp + retention period, which isn't late enough to expire the tombstones in that transaction. GitOrigin-RevId: 63018c48524edb0f8edd9e40f1b21cc97bc546cc Add estLogicalFileSize to FileAction Add estLogicalFileSize to FileAction for easier Deletion Vector processing. GitOrigin-RevId: c7cf0ad32e378bcfc4e4c046c5d76667bb8659c7 Support insert-into-by-name for generated columns Spark 3.4 no longer requires users to provide _all_ columns in insert-by-name queries. This means Delta can now support omitting generated columns from the column list in such queries. This test adds support for this and adds some additional tests related to the changed by-name support. Resolves delta-io#1215 Adds unit tests. Yes. Users will be able to omit generated columns from the column list when inserting by name. Closes delta-io#1743 GitOrigin-RevId: 8694fab3d93b71b4230bf6f5dd0f2a21be6f3634 Implement PURGE to remove DVs from Delta tables This PR introduces a `REORG TABLE ... APPLY (PURGE)` SQL command that can materialize soft-delete operations by DVs. The command works by rewriting and bin-packing (if applicable) only files that have DVs attached, which is different from the `OPTIMIZE` command where all files (with and without) DV will be bin-packed. To achieve this, we hack the `OPTIMIZE` logic so files of any size with DVs will be rewritten. Follow-up: - Set the correct commit info. Now the resulting version is marked as `optimize` rather than `purge`. - Clean up DVs from the filesystem. New tests. Closes delta-io#1732 Signed-off-by: Venki Korukanti <venki.korukanti@databricks.com> GitOrigin-RevId: 98ef156d62698986bfb54681e386971e2fec08b8 Unify predicate strings in CommitInfo to record the information in a consistent way. GitOrigin-RevId: 043a6a4181c112b9c9a45906c1275fbbdbbb1388 Minor refactoring to Delta source. GitOrigin-RevId: 3625a5c44999139ef4976c62473b233167a4aa83 Add Option.whenNot Scala extension helper and replace usage of Option.when(!cond). GitOrigin-RevId: e26244544cadeeff1d55862f840d4c6c5570e83b Introduce DomainMetadata action to delta spec We propose to introduce a new Action type called DomainMetadata to the Delta spec. In a nutshell, DomainMetadata allows specifying configurations (string key/value pairs) per metadata domain, and a custom conflict handler can be registered to a metadata domain. More details can be found in the design doc [here](https://docs.google.com/document/d/16MHP7P78cq9g8SWhAyfgFlUe-eH0xhnMAymgBVR1fCA/edit?usp=sharing). The github issue delta-io#1741 was created. Spec only change and no test is needed. Closes delta-io#1742 GitOrigin-RevId: 5d33d8b99e33c5c1e689672a8ca2ab3863feab54 DV stress test: Delete from a table of a large number of rows with DVs This PR tests DELETing from a table of 2 billion rows (`2<<31 + 10`), some of which are marked as deleted by a DV. The goal is to ensure that DV can still be read and manipulated in such a scenario. We don't `delete a large number of rows` and `materialize DV` because they run too slow to fit in a unit test (9 and 20 minutes respectively). GitOrigin-RevId: 1273c9372907be0345465c2176a7f76115adbb47 RESTORE support for Delta tables with deletion vectors This PR is part of the feature: Support Delta tables with deletion vectors (more details at delta-io#1485) It adds running RESTORE on a Delta table with deletion vectors. The main change is to take into consideration of the `AddFile.deletionVector` when comparing the target version being restored to and the current version to find the list of data files to add and remove. Added tests Closes delta-io#1735 GitOrigin-RevId: b722e0b058ede86f652cd4e4229a7217916511da Disallow overwriteSchema with dynamic partitions overwrite Disallow overwriteSchema when partitionOverwriteMode is set to dynamic. Otherwise, the table might become corrupted as schemas of newly written partitions would not match the non-overwritten partitions. GitOrigin-RevId: 1012793448c1ffed9a3f8bde507d9fe1ee183803 SHALLOW CLONE support for Delta tables with deletion vectors. This PR is part of the feature: Support Delta tables with deletion vectors (more details at delta-io#1485) This PR adds support for SHALLOW CLONEing a Delta table with deletion vectors. The main change is to convert the relative path of DV file in `AddFile` to absolute path when cloning the table. Added tests Closes delta-io#1733 GitOrigin-RevId: b634496b57b93fc4b7a7cc16e33c200e3a83ba64 Adds tests for REPLACE WHERE SQL syntax Spark 3.4 added RELACE WHERE SQL support for insert. This PR adds tests for the feature after upgrading to Spark 3.4. Closes delta-io#1737 GitOrigin-RevId: 8bf0e7423a6f0846d5f9ef4e637ee9ced9bef8d1 Fix a test in `DeltaThrowableSuite.scala` Fix a test in `DeltaThrowableSuite.scala` GitOrigin-RevId: 28acd5fe8d8cadd569c479fe0f02d99dac1c13b3 Fix statistics computation issues with Delta tables with DVs This PR makes following changes: - Delta protocol [requires](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#writer-requirement-for-deletion-vectors) that every `AddFile` with DV must have `numRecords` in file statistics. The current implementation of DELETE with DVs violates this requirement when the source `AddFile` has no statistics to begin with. This PR fixes it by computing stats for `AddFile`s with missing stats and have DVs generated as part of the DELETE with DV operation. The stats are generated by reading the Parquet file footer. - DELETE with DVs currently has a bug where setting the `tightBounds=false` for `AddFile`s with DVs doesn't correctly set the `NULL_COUNT` for column with all nulls. - Throw error when stats re-computation command is run on Delta tables with DVs. This is a TODO, we need to implement it but for now throw error to avoid calculating wrong statistics for Delta tables with DVs. GitOrigin-RevId: f69968961dcf4766b6847a191b66aae7f9ff295d Remove the check that disables writes to Delta tables with deletion vectors Given that now we have support for writing into DV tables and table utility operations as as part of the delta-io#1485 and delta-io#1591, we should remove the check. Closes delta-io#1736 Signed-off-by: Venki Korukanti <venki.korukanti@databricks.com> GitOrigin-RevId: 17e7e9c6796229ada77148a730c69348a55890b9 Regex based table matching in DeleteScalaSuite Use a more reliable regex-based approach to getting a `DeltaTable` instance from a sql identifier string in `DeleteScalaSuite`. GitOrigin-RevId: 1d0e1477a7d22373e8478d7debc3565c092090da Enable SQL support for WHEN NOT MATCHED BY SOURCE The SQL syntax for merge with WHEN NOT MATCHED BY SOURCE clauses was shipped with Spark 3.4. Now that Delta picked up Spark 3.4, we can enable SQL support and mix in SQL tests for WHEN NOT MATCHED BY SOURCE. Existing tests for WHEN NOT MATCHED BY SOURCE are now run in the Merge SQL suite. Closes delta-io#1740 GitOrigin-RevId: 1ddd1216e13f854901da47896936527618ea4dca Minor refactor to DeltaCatalog.scala GitOrigin-RevId: 53b083f9abf92330d253fbdd9208d2783428dd98 Correctly recurse into nested arrays & maps in add/drop columns It is not possible today in Delta tables to add or drop nested fields under two or more levels of directly nested arrays or maps. The following is a valid use case but fails today: ``` CREATE TABLE test (data array<array<struct<a: int>>>) ALTER TABLE test ADD COLUMNS (data.element.element.b string) ``` This change updates helper methods `findColumnPosition`, `addColumn` and `dropColumn` in `SchemaUtils` to correctly recurse into directly nested maps and arrays. Note that changes in Spark are also required for `ALTER TABLE ADD/CHANGE/DROP COLUMN` to work: apache/spark#40879. The fix is merged in Spark but will only be available in Delta in the next Spark release. In addition, `findColumnPosition` which currently both returns the position of nested field and the size of its parent, making it overly complex, is split into two distinct and generic methods: `findColumnPosition` and `getNestedTypeFromPosition`. - Tests for `findColumnPosition`, `addColumn` and `dropColumn` with two levels of nested maps and arrays are added to `SchemaUtilsSuite`. Other cases for these methods are already covered by existing tests. - Tested locally that ALTER TABLE ADD/CHANGE/DROP COLUMN(S) works correctly with Spark fix apache/spark#40879 - Added missing tests coverage for ALTER TABLE ADD/CHANGE/DROP COLUMN(S) with a single map or array. Closes delta-io#1731 GitOrigin-RevId: 53ed05813f4002ae986926506254d780e2ecddfa
Description
We propose to introduce a new Action type called DomainMetadata to the Delta spec. In a nutshell, DomainMetadata allows specifying configurations (string key/value pairs) per metadata domain, and a custom conflict handler can be registered to a metadata domain. More details can be found in the design doc here.
The github issue 1741 was created.
How was this patch tested?
Spec only change and no test is needed.
Does this PR introduce any user-facing changes?
No user facing change.