Merge branch 'master' into parallelize-tests

tdas · Apr 17, 2024 · 9e60919 · 9e60919
2 parents 4e6f680 + bba0e94
commit 9e60919
Show file tree

Hide file tree

Showing 286 changed files with 14,062 additions and 2,018 deletions.
diff --git a/PROTOCOL.md b/PROTOCOL.md
@@ -55,6 +55,9 @@
   - [Row Commit Versions](#row-commit-versions)
   - [Reader Requirements for Row Tracking](#reader-requirements-for-row-tracking)
   - [Writer Requirements for Row Tracking](#writer-requirements-for-row-tracking)
+- [VACUUM Protocol Check](#vacuum-protocol-check)
+  - [Writer Requirements for Vacuum Protocol Check](#writer-requirements-for-vacuum-protocol-check)
+  - [Reader Requirements for Vacuum Protocol Check](#reader-requirements-for-vacuum-protocol-check)
 - [Clustered Table](#clustered-table)
   - [Writer Requirements for Clustered Table](#writer-requirements-for-clustered-table)
 - [Requirements for Writers](#requirements-for-writers)
@@ -101,6 +104,7 @@
   - [Last Checkpoint File Schema](#last-checkpoint-file-schema)
     - [JSON checksum](#json-checksum)
       - [How to URL encode keys and string values](#how-to-url-encode-keys-and-string-values)
+  - [Delta Data Type to Parquet Type Mappings](#delta-data-type-to-parquet-type-mappings)
 
 <!-- END doctoc generated TOC please keep comment here to allow auto update -->
 
@@ -206,7 +210,7 @@ The checkpoint file name is based on the version of the table that the checkpoin
 Delta supports three kinds of checkpoints:
 
 1. UUID-named Checkpoints: These follow [V2 spec](#v2-spec) which uses the following file name: `n.checkpoint.u.{json/parquet}`, where `u` is a UUID and `n` is the
-snapshot version that this checkpoint represents. The UUID-named V2 Checkpoint may be in json or parquet format, and references zero or more checkpoint sidecars
+snapshot version that this checkpoint represents. Here `n` must be zero padded to have length 20. The UUID-named V2 Checkpoint may be in json or parquet format, and references zero or more checkpoint sidecars
 in the `_delta_log/_sidecars` directory. A checkpoint sidecar is a uniquely-named parquet file: `{unique}.parquet` where `unique` is some unique
 string such as a UUID.
 
@@ -219,7 +223,7 @@ _sidecars/016ae953-37a9-438e-8683-9a9a4a79a395.parquet
 _sidecars/7d17ac10-5cc3-401b-bd1a-9c82dd2ea032.parquet
 ```
 
-2. A [classic checkpoint](#classic-checkpoint) for version `n` of the table consists of a file named `n.checkpoint.parquet`.
+2. A [classic checkpoint](#classic-checkpoint) for version `n` of the table consists of a file named `n.checkpoint.parquet`. Here `n` must be zero padded to have length 20.
 These could follow either [V1 spec](#v1-spec) or [V2 spec](#v2-spec).
 For example:
 
@@ -229,7 +233,7 @@ For example:
 
 
 3. A [multi-part checkpoint](#multi-part-checkpoint) for version `n` consists of `p` "part" files (`p > 1`), where
-part `o` of `p` is named `n.checkpoint.o.p.parquet`. These are always [V1 checkpoints](#v1-spec).
+part `o` of `p` is named `n.checkpoint.o.p.parquet`. Here `n` must be zero padded to have length 20, while `o` and `p` must be zero padded to have length 10. These are always [V1 checkpoints](#v1-spec).
 For example:
 
 ```
@@ -1179,6 +1183,29 @@ When Row Tracking is enabled (when the table property `delta.enableRowTracking`
   In particular, writers should set `delta.rowTracking.preserved` in the `tags` of the `commitInfo` action to `true` if no rows are updated or copied.
   Writers should set that flag to false otherwise.
 
+# VACUUM Protocol Check
+
+The `vacuumProtocolCheck` ReaderWriter feature ensures consistent application of reader and writer protocol checks during `VACUUM` operations, addressing potential protocol discrepancies and mitigating the risk of data corruption due to skipped writer checks.
+
+Enablement:
+- The table must be on Writer Version 7 and Reader Version 3.
+- The feature `vacuumProtocolCheck` must exist in the table `protocol`'s `writerFeatures` and `readerFeatures`.
+
+## Writer Requirements for Vacuum Protocol Check
+
+This feature affects only the VACUUM operations; standard commits remain unaffected.
+
+Before performing a VACUUM operation, writers must ensure that they check the table's write protocol. This is most easily implemented by adding an unconditional write protocol check for all tables, which removes the need to examine individual table properties.
+
+Writers that do not implement VACUUM do not need to change anything and can safely write to tables that enable the feature.
+
+## Reader Requirements for Vacuum Protocol Check
+
+For tables with Vacuum Protocol Check enabled, readers don’t need to understand or change anything new; they just need to acknowledge the feature exists.
+
+Making this feature a ReaderWriter feature (rather than solely a Writer feature) ensures that:
+- Older vacuum implementations, which only performed the Reader protocol check and lacked the Writer protocol check, will begin to fail if the table has `vacuumProtocolCheck` enabled.This change allows future writer features to have greater flexibility and safety in managing files within the table directory, eliminating the risk of older Vacuum implementations (that lack the Writer protocol check) accidentally deleting relevant files.
+
 # Clustered Table
 
 The Clustered Table feature facilitates the physical clustering of rows that share similar values on a predefined set of clustering columns.
@@ -1611,7 +1638,9 @@ Feature | Name | Readers or Writers?
 [Domain Metadata](#domain-metadata) | `domainMetadata` | Writers only
 [V2 Checkpoint](#v2-checkpoint-table-feature) | `v2Checkpoint` | Readers and writers
 [Iceberg Compatibility V1](#iceberg-compatibility-v1) | `icebergCompatV1` | Writers only
+[Iceberg Compatibility V2](#iceberg-compatibility-v2) | `icebergCompatV2` | Writers only
 [Clustered Table](#clustered-table) | `clustering` | Writers only
+[VACUUM Protocol Check](#vacuum-protocol-check) | `vacuumProtocolCheck` | Readers and Writers
 
 ## Deletion Vector Format
 
@@ -1785,7 +1814,7 @@ An array stores a variable length collection of items of some type.
 Field Name | Description
 -|-
 type| Always the string "array"
-elementType| The type of element stored in this array represented as a string containing the name of a primitive type, a struct definition, an array definition or a map definition
+elementType| The type of element stored in this array, represented as a string containing the name of a primitive type, a struct definition, an array definition or a map definition
 containsNull| Boolean denoting whether this array can contain one or more null values
 
 ### Map Type
@@ -2086,3 +2115,27 @@ uppercase and lowercase as part of percent-encoding. Thus, we require a stricter
 3. Always [percent-encode](https://datatracker.ietf.org/doc/html/rfc3986#section-2) reserved octets
 4. Never percent-encode non-reserved octets
 5. A percent-encoded octet consists of three characters: `%` followed by its 2-digit hexadecimal value in uppercase letters, e.g. `>` encodes to `%3E`
+
+## Delta Data Type to Parquet Type Mappings
+Below table captures how each Delta data type is stored physically in Parquet files. Parquet files are used for storing the table data or metadata ([checkpoints](#checkpoints)). Parquet has a limited number of [physical types](https://parquet.apache.org/docs/file-format/types/). Parquet [logical types](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md) are used to extend the types by specifying how the physical types should be interpreted.
+
+For some of the Delta data types, there are multiple ways store the values physically in Parquet file. For example, `timestamp` can be stored either as `int96` or `int64`. The exact physical type depends on the engine that is writing the Parquet file and/or engine specific configuration options. For a Delta lake table reader, it is recommended that the Parquet file reader support at least the Parquet physical and logical types mentioned in the below table.
+
+Delta Type Name | Parquet Physical Type | Parquet Logical Type
+-|-|-
+boolean| `boolean` |
+byte| `int32` | `INT(bitwidth = 8, signed = true)`
+short| `int32` | `INT(bitwidth = 16, signed = true)`
+int| `int32` | `INT(bitwidth = 32, signed = true)`
+long| `int64` | `INT(bitwidth = 64, signed = true)`
+date| `int32` | `DATE`
+timestamp| `int96` or `int64` | `TIMESTAMP(isAdjustedToUTC = true, units = microseconds)`
+timestamp without time zone| `int96` or `int64` | `TIMESTAMP(isAdjustedToUTC = false, units = microseconds)`
+float| `float` |
+double| `double` |
+decimal| `int32`, `int64` or `fixed_length_binary` | `DECIMAL(scale, precision)`
+string| `binary` | `string (UTF-8)`
+binary| `binary` |
+array| either as `2-level` or `3-level` representation. Refer to [Parquet documentation](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#lists) for further details | `LIST`
+map| either as `2-level` or `3-level` representation. Refer to [Parquet documentation](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#maps) for further details | `MAP`
+struct| `group` |