Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Protocol Change Request] Type Widening table feature #2624

Merged
merged 8 commits into from
Feb 27, 2024
15 changes: 9 additions & 6 deletions protocol_rfcs/type-widening.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ to a wider type.
The **allowed type changes** are:
- Integer widening: `Byte` -> `Short` -> `Int` -> `Long`
- Floating-point widening: `Float` -> `Double`
- Decimal widening: `Decimal(p, s)` -> `Decimal(p', s')` where `p' >= p` and `p' - s' >= p - s`. `p` and `p'` denote the decimal precision and `s` and `s'` denote the decimal scale.
- Decimal widening: `Decimal(p, s)` -> `Decimal(p + k1, s + k2)` where `k1 >= k2 >= 0`. `p` and `s` denote the decimal precision and scale respectively.
- Date widening: `Date` -> `Timestamp without timezone`

To support this feature:
Expand Down Expand Up @@ -121,16 +121,19 @@ When Type Widening is enabled (when the table property `delta.enableTypeWidening
- Writers must record type change information in the `metadata` of the nearest ancestor [StructField](#struct-field). See [Type Change Metadata](#type-change-metadata).

When Type Widening is supported (when the `writerFeatures` field of a table's `protocol` action contains `enableTypeWidening`), then:
johanl-db marked this conversation as resolved.
Show resolved Hide resolved
- Writers must preserve the `delta.typeChanges` field in the metadata fields in the schema when a schema is updated.
- Writers can remove an element from a `delta.typeChanges` field in the metadata fields in the schema when all active `add` actions in the latest version of the table have a `defaultRowCommitVersion` value greater or equal to the `tableVersion` value of that `delta.typeChanges` element.
johanl-db marked this conversation as resolved.
Show resolved Hide resolved
- Writers must set the `defaultRowCommitVersion` field in new `add` actions to the version number of the log enty containing the `add` action.
johanl-db marked this conversation as resolved.
Show resolved Hide resolved
This is a subset of the requirements from [Writer Requirements for Row Tracking](writer-requirements-for-row-tracking) that may be implemented separately without introducing a dependency on the [Row Tracking](#row-tracking) table feature.
- Writers must set the `defaultRowCommitVersion` field in recommitted and checkpointed `add` actions and `remove` actions to the `defaultRowCommitVersion` of the last committed `add` action with the same `path`.

The last two requirements related to `defaultRowCommitVersion` are a subset of the requirements from [Writer Requirements for Row Tracking](writer-requirements-for-row-tracking) that may be implemented separately without introducing a dependency on the [Row Tracking](#row-tracking) table feature.
johanl-db marked this conversation as resolved.
Show resolved Hide resolved

## Reader Requirements for Type Widening
When Type Widening is supported (when the `readerFeatures` field of a table's `protocol` action contains `enableTypeWidening`), then:
- Readers must allow reading data files written before the table underwent any allowed type change.

- Readers must allow reading data files written before the table underwent any allowed type change and convert the values to the current, wider type.
johanl-db marked this conversation as resolved.
Show resolved Hide resolved

### Column Metadata
> ***Change to existing section***
> ***Change to existing section (underlined)***

A column metadata stores various information about the column.
For example, this MAY contain some keys like [`delta.columnMapping`](#column-mapping) or [`delta.generationExpression`](#generated-columns) or [`CURRENT_DEFAULT`](#default-columns).
Expand All @@ -140,4 +143,4 @@ delta.columnMapping.*| These keys are used to store information about the mappin
delta.identity.*| These keys are for defining identity columns. See [Identity Columns](#identity-columns) for details.
delta.invariants| JSON string contains SQL expression information. See [Column Invariants](#column-invariants) for details.
delta.generationExpression| SQL expression string. See [Generated Columns](#generated-columns) for details.
delta.typeChanges| JSON string containing information about previous type changes applied to this column. See [Type Change Metadata](#type-change-metadata) for details.
<ins>delta.typeChanges</ins>| <ins>JSON string containing information about previous type changes applied to this column. See [Type Change Metadata](#type-change-metadata) for details.</ins>
Loading