Skip to content

Commit

Permalink
Merge branch 'master' into parallelize-tests
Browse files Browse the repository at this point in the history
  • Loading branch information
tdas authored Apr 17, 2024
2 parents 4e6f680 + bba0e94 commit 9e60919
Show file tree
Hide file tree
Showing 286 changed files with 14,062 additions and 2,018 deletions.
61 changes: 57 additions & 4 deletions PROTOCOL.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,9 @@
- [Row Commit Versions](#row-commit-versions)
- [Reader Requirements for Row Tracking](#reader-requirements-for-row-tracking)
- [Writer Requirements for Row Tracking](#writer-requirements-for-row-tracking)
- [VACUUM Protocol Check](#vacuum-protocol-check)
- [Writer Requirements for Vacuum Protocol Check](#writer-requirements-for-vacuum-protocol-check)
- [Reader Requirements for Vacuum Protocol Check](#reader-requirements-for-vacuum-protocol-check)
- [Clustered Table](#clustered-table)
- [Writer Requirements for Clustered Table](#writer-requirements-for-clustered-table)
- [Requirements for Writers](#requirements-for-writers)
Expand Down Expand Up @@ -101,6 +104,7 @@
- [Last Checkpoint File Schema](#last-checkpoint-file-schema)
- [JSON checksum](#json-checksum)
- [How to URL encode keys and string values](#how-to-url-encode-keys-and-string-values)
- [Delta Data Type to Parquet Type Mappings](#delta-data-type-to-parquet-type-mappings)

<!-- END doctoc generated TOC please keep comment here to allow auto update -->

Expand Down Expand Up @@ -206,7 +210,7 @@ The checkpoint file name is based on the version of the table that the checkpoin
Delta supports three kinds of checkpoints:

1. UUID-named Checkpoints: These follow [V2 spec](#v2-spec) which uses the following file name: `n.checkpoint.u.{json/parquet}`, where `u` is a UUID and `n` is the
snapshot version that this checkpoint represents. The UUID-named V2 Checkpoint may be in json or parquet format, and references zero or more checkpoint sidecars
snapshot version that this checkpoint represents. Here `n` must be zero padded to have length 20. The UUID-named V2 Checkpoint may be in json or parquet format, and references zero or more checkpoint sidecars
in the `_delta_log/_sidecars` directory. A checkpoint sidecar is a uniquely-named parquet file: `{unique}.parquet` where `unique` is some unique
string such as a UUID.

Expand All @@ -219,7 +223,7 @@ _sidecars/016ae953-37a9-438e-8683-9a9a4a79a395.parquet
_sidecars/7d17ac10-5cc3-401b-bd1a-9c82dd2ea032.parquet
```

2. A [classic checkpoint](#classic-checkpoint) for version `n` of the table consists of a file named `n.checkpoint.parquet`.
2. A [classic checkpoint](#classic-checkpoint) for version `n` of the table consists of a file named `n.checkpoint.parquet`. Here `n` must be zero padded to have length 20.
These could follow either [V1 spec](#v1-spec) or [V2 spec](#v2-spec).
For example:

Expand All @@ -229,7 +233,7 @@ For example:


3. A [multi-part checkpoint](#multi-part-checkpoint) for version `n` consists of `p` "part" files (`p > 1`), where
part `o` of `p` is named `n.checkpoint.o.p.parquet`. These are always [V1 checkpoints](#v1-spec).
part `o` of `p` is named `n.checkpoint.o.p.parquet`. Here `n` must be zero padded to have length 20, while `o` and `p` must be zero padded to have length 10. These are always [V1 checkpoints](#v1-spec).
For example:

```
Expand Down Expand Up @@ -1179,6 +1183,29 @@ When Row Tracking is enabled (when the table property `delta.enableRowTracking`
In particular, writers should set `delta.rowTracking.preserved` in the `tags` of the `commitInfo` action to `true` if no rows are updated or copied.
Writers should set that flag to false otherwise.

# VACUUM Protocol Check

The `vacuumProtocolCheck` ReaderWriter feature ensures consistent application of reader and writer protocol checks during `VACUUM` operations, addressing potential protocol discrepancies and mitigating the risk of data corruption due to skipped writer checks.

Enablement:
- The table must be on Writer Version 7 and Reader Version 3.
- The feature `vacuumProtocolCheck` must exist in the table `protocol`'s `writerFeatures` and `readerFeatures`.

## Writer Requirements for Vacuum Protocol Check

This feature affects only the VACUUM operations; standard commits remain unaffected.

Before performing a VACUUM operation, writers must ensure that they check the table's write protocol. This is most easily implemented by adding an unconditional write protocol check for all tables, which removes the need to examine individual table properties.

Writers that do not implement VACUUM do not need to change anything and can safely write to tables that enable the feature.

## Reader Requirements for Vacuum Protocol Check

For tables with Vacuum Protocol Check enabled, readers don’t need to understand or change anything new; they just need to acknowledge the feature exists.

Making this feature a ReaderWriter feature (rather than solely a Writer feature) ensures that:
- Older vacuum implementations, which only performed the Reader protocol check and lacked the Writer protocol check, will begin to fail if the table has `vacuumProtocolCheck` enabled.This change allows future writer features to have greater flexibility and safety in managing files within the table directory, eliminating the risk of older Vacuum implementations (that lack the Writer protocol check) accidentally deleting relevant files.

# Clustered Table

The Clustered Table feature facilitates the physical clustering of rows that share similar values on a predefined set of clustering columns.
Expand Down Expand Up @@ -1611,7 +1638,9 @@ Feature | Name | Readers or Writers?
[Domain Metadata](#domain-metadata) | `domainMetadata` | Writers only
[V2 Checkpoint](#v2-checkpoint-table-feature) | `v2Checkpoint` | Readers and writers
[Iceberg Compatibility V1](#iceberg-compatibility-v1) | `icebergCompatV1` | Writers only
[Iceberg Compatibility V2](#iceberg-compatibility-v2) | `icebergCompatV2` | Writers only
[Clustered Table](#clustered-table) | `clustering` | Writers only
[VACUUM Protocol Check](#vacuum-protocol-check) | `vacuumProtocolCheck` | Readers and Writers

## Deletion Vector Format

Expand Down Expand Up @@ -1785,7 +1814,7 @@ An array stores a variable length collection of items of some type.
Field Name | Description
-|-
type| Always the string "array"
elementType| The type of element stored in this array represented as a string containing the name of a primitive type, a struct definition, an array definition or a map definition
elementType| The type of element stored in this array, represented as a string containing the name of a primitive type, a struct definition, an array definition or a map definition
containsNull| Boolean denoting whether this array can contain one or more null values

### Map Type
Expand Down Expand Up @@ -2086,3 +2115,27 @@ uppercase and lowercase as part of percent-encoding. Thus, we require a stricter
3. Always [percent-encode](https://datatracker.ietf.org/doc/html/rfc3986#section-2) reserved octets
4. Never percent-encode non-reserved octets
5. A percent-encoded octet consists of three characters: `%` followed by its 2-digit hexadecimal value in uppercase letters, e.g. `>` encodes to `%3E`

## Delta Data Type to Parquet Type Mappings
Below table captures how each Delta data type is stored physically in Parquet files. Parquet files are used for storing the table data or metadata ([checkpoints](#checkpoints)). Parquet has a limited number of [physical types](https://parquet.apache.org/docs/file-format/types/). Parquet [logical types](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md) are used to extend the types by specifying how the physical types should be interpreted.

For some of the Delta data types, there are multiple ways store the values physically in Parquet file. For example, `timestamp` can be stored either as `int96` or `int64`. The exact physical type depends on the engine that is writing the Parquet file and/or engine specific configuration options. For a Delta lake table reader, it is recommended that the Parquet file reader support at least the Parquet physical and logical types mentioned in the below table.

Delta Type Name | Parquet Physical Type | Parquet Logical Type
-|-|-
boolean| `boolean` |
byte| `int32` | `INT(bitwidth = 8, signed = true)`
short| `int32` | `INT(bitwidth = 16, signed = true)`
int| `int32` | `INT(bitwidth = 32, signed = true)`
long| `int64` | `INT(bitwidth = 64, signed = true)`
date| `int32` | `DATE`
timestamp| `int96` or `int64` | `TIMESTAMP(isAdjustedToUTC = true, units = microseconds)`
timestamp without time zone| `int96` or `int64` | `TIMESTAMP(isAdjustedToUTC = false, units = microseconds)`
float| `float` |
double| `double` |
decimal| `int32`, `int64` or `fixed_length_binary` | `DECIMAL(scale, precision)`
string| `binary` | `string (UTF-8)`
binary| `binary` |
array| either as `2-level` or `3-level` representation. Refer to [Parquet documentation](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#lists) for further details | `LIST`
map| either as `2-level` or `3-level` representation. Refer to [Parquet documentation](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#maps) for further details | `MAP`
struct| `group` |
Loading

0 comments on commit 9e60919

Please sign in to comment.