Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Protocol Change Request] Type Widening table feature #2624

Merged
merged 8 commits into from
Feb 27, 2024

Conversation

johanl-db
Copy link
Collaborator

@johanl-db johanl-db commented Feb 9, 2024

Protocol Change Request

Description of the protocol change

This protocol change is part of the proposed Type Widening table feature, see feature request: #2622

Protocol RFC issue: #2623

The type widening table feature covers changing the type of existing columns or nested fields in a Delta table without having to rewrite the data.

Willingness to contribute

The Delta Lake Community encourages protocol innovations. Would you or another member of your organization be willing to contribute this feature to the Delta Lake code base?

  • Yes. I can contribute.
  • Yes. I would be willing to contribute with guidance from the Delta Lake community.
  • No. I cannot contribute at this time.

@johanl-db johanl-db changed the title Type Widening Protocol RFC [WIP] Type Widening Protocol RFC Feb 9, 2024
@johanl-db johanl-db changed the title [WIP] Type Widening Protocol RFC Type Widening Protocol RFC Feb 12, 2024
@johanl-db johanl-db self-assigned this Feb 12, 2024
protocol_rfcs/type-widening.md Show resolved Hide resolved
protocol_rfcs/type-widening.md Show resolved Hide resolved
protocol_rfcs/type-widening.md Outdated Show resolved Hide resolved
protocol_rfcs/type-widening.md Outdated Show resolved Hide resolved
protocol_rfcs/type-widening.md Show resolved Hide resolved
protocol_rfcs/type-widening.md Show resolved Hide resolved
protocol_rfcs/type-widening.md Outdated Show resolved Hide resolved
protocol_rfcs/type-widening.md Outdated Show resolved Hide resolved
protocol_rfcs/type-widening.md Outdated Show resolved Hide resolved
johanl-db and others added 3 commits February 13, 2024 17:52
Co-authored-by: Ryan Johnson <ryan.johnson@databricks.com>
Co-authored-by: Ryan Johnson <ryan.johnson@databricks.com>
Co-authored-by: Bart Samwel <bart.samwel@databricks.com>
@tdas
Copy link
Contributor

tdas commented Feb 13, 2024

Can add `[Protocol Change Request] in the title. I added the issue template for this, but for a silly reason its not kicking in - https://github.com/delta-io/delta/blob/master/.github/ISSUE_TEMPLATE/protocol-rfc.md

Can you update the description based on this?

@johanl-db johanl-db changed the title Type Widening Protocol RFC [Protocol Change Request] Type Widening table feature Feb 14, 2024
Copy link
Collaborator

@bart-samwel bart-samwel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM.

protocol_rfcs/type-widening.md Outdated Show resolved Hide resolved
protocol_rfcs/type-widening.md Outdated Show resolved Hide resolved
Copy link
Contributor

@tdas tdas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tdas tdas merged commit 5d25578 into delta-io:master Feb 27, 2024
6 checks passed
vkorukanti pushed a commit that referenced this pull request Feb 29, 2024
## Description
This change introduces the `typeWidening` delta table feature, allowing to widen the type of existing columns and fields in a delta table using the `ALTER TABLE CHANGE COLUMN TYPE` or `ALTER TABLE REPLACE COLUMNS` commands.

The table feature is introduced as `typeWidening-dev` during implementation and is available in testing only.

For now, only byte -> short -> int are supported. Other changes will require support in the Spark parquet reader that will be introduced in Spark 4.0

Type widening feature request: #2622
Type Widening protocol RFC: #2624

A new test suite `DeltaTypeWideningSuite` is created, containing:
- `DeltaTypeWideningAlterTableTests`: Covers applying supported and unsupported type changes on partitioned columns, non-partitioned columns and nested fields
- `DeltaTypeWideningTableFeatureTests`: Covers adding the `typeWidening` table feature

## This PR introduces the following *user-facing* changes

The table feature is available in testing only, there's no user-facing changes as of now.

The type widening table feature will introduce the following changes:
- Adding the `typeWidening` via a table property:
```
ALTER TABLE t SET TBLPROPERTIES (‘delta.enableTypeWidening' = true)
```
- Apply a widening type change:
```
ALTER TABLE t CHANGE COLUMN int_col TYPE long
```
or
```
ALTER TABLE t REPLACE COLUMNS int_col TYPE long
```

Note: both ALTER TABLE commands reuse the existing syntax for setting a table property and applying a type change, no new SQL syntax is being introduced by this feature.

Closes #2645

GitOrigin-RevId: 2ca0e6b22ec24b304241460553547d0d4c6026a2
allisonport-db pushed a commit that referenced this pull request Mar 7, 2024
#### Which Delta project/connector is this regarding?

-Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

This change is part of the type widening table feature.
Type widening feature request: #2622
Type Widening protocol RFC: #2624

It introduces metadata to record information about type changes that were applied using `ALTER TABLE`. This metadata is stored in table schema, as specified in https://github.com/delta-io/delta/pull/2624/files#diff-114dec1ec600a6305fe7117bed7acb46e94180cdb1b8da63b47b12d6c40760b9R28

For example, changing a top-level column `a` from `int` to `long` will update the schema to include metadata:
```
{
    "name" : "a",
    "type" : "long",
    "nullable" : true,
    "metadata" : {
      "delta.typeChanges": [
        {
          "tableVersion": 1,
          "fromType": "integer",
          "toType": "long"
        },
        {
          "tableVersion": 5,
          "fromType": "integer",
          "toType": "long"
        }
      ]
    }
  }
```

- A new test suite `DeltaTypeWideningMetadataSuite` is created to cover methods handling type widening metadata.
- Tests covering adding metadata to the schema when running `ALTER TABLE CHANGE COLUMN` are added to `DeltaTypeWideningSuite`

Closes #2708

GitOrigin-RevId: cdbb7589f10a8355b66058e156bb7d1894268f4d
vkorukanti pushed a commit that referenced this pull request Mar 15, 2024
This PR includes changes from
#2708 which isn't merged yet.
The changes related only to dropping the table feature are in commit
e2601a6


## Description
This change is part of the type widening table feature.
Type widening feature request:
#2622
Type Widening protocol RFC: #2624

It adds the ability to remove the type widening table feature by running
the `ALTER TABLE DROP FEATURE` command.
Before dropping the table feature, traces of it are removed from the
current version of the table:
- Files that were written before the latest type change and thus contain
types that differ from the current table schema are rewritten using an
internal `REORG TABLE` operation.
- Metadata in the table schema recording previous type changes is
removed.

## How was this patch tested?
- A new set of tests are added to `DeltaTypeWideningSuite` to cover
dropping the table feature with tables in various states: with/without
files to rewrite or metadata to remove.

## Does this PR introduce _any_ user-facing changes?
The table feature is available in testing only, there's no user-facing
changes as of now.

When the feature is available, this change enables the following user
action:
- Drop the type widening table feature:
```
ALTER TABLE t DROP FEATURE typeWidening
```
This succeeds immediately if no version of the table contains traces of
the table feature (= no type changes were applied in the available
history of the table.
Otherwise, if the current version contains traces of the feature, these
are removed: files are rewritten if needed and type widening metadata is
removed from the table schema. Then, an error
`DELTA_FEATURE_DROP_WAIT_FOR_RETENTION_PERIOD` is thrown, telling the
user to retry once the retention period expires.

If only previous versions contain traces of the feature, no action is
applied on the table, and an error
`DELTA_FEATURE_DROP_HISTORICAL_VERSIONS_EXIST` is thrown, telling the
user to retry once the retention period expires.
tdas pushed a commit that referenced this pull request Mar 22, 2024
<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [x] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

## Description
This change is part of the type widening table feature.
Type widening feature request:
#2622
Type Widening protocol RFC: #2624

It adds automatic type widening as part of schema evolution in MERGE
INTO:
- During resolution of the `DeltaMergeInto` plan, when merging the
target and source schema to compute the schema after evolution, we keep
the wider source type when type widening is enabled on the table.
- When updating the table schema at the beginning of MERGE execution,
metadata is added to the schema to record type changes.

## How was this patch tested?
- A new test suite `DeltaTypeWideningSchemaEvolutionSuite` is added to
cover type evolution in MERGE

## This PR introduces the following *user-facing* changes
The table feature is available in testing only, there are no user-facing
changes as of now.

When automatic schema evolution is enabled in MERGE and the source
schema contains a type that is wider than the target schema:

With type widening disabled: the type in the target schema is not
changed. the ingestion behavior follows the `storeAssignmentPolicy`
configuration:
- LEGACY: source values that overflow the target type are stored as
`null`
- ANSI: a runtime check is injected to fail on source values that
overflow the target type.
- STRICT: the MERGE operation fails during analysis.

With type widening enabled: the type in the target schema is updated to
the wider source type. The MERGE operation always succeeds:
```
-- target: key int, value short
-- source: key int, value int
MERGE INTO target
USING source
ON target.key = source.key
WHEN MATCHED THEN UPDATE SET *
```
After the MERGE operation, the target schema is `key int, value int`.
tdas pushed a commit that referenced this pull request Mar 25, 2024
#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [X] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

## Description
This change is part of the type widening table feature.
Type widening feature request:
#2622
Type Widening protocol RFC: #2624

It adds automatic type widening as part of schema evolution in INSERT.
During resolution, when schema evolution and type widening are enabled,
type differences between the input query and the target table are
handled as follows:
- If the type difference qualifies for automatic type evolution: the
input type is left as is, the data will be inserted with the new type
and the table schema will be updated in `ImplicitMetadataOperation`
(already implemented as part of MERGE support)
- If the type difference doesn't qualify for automatic type evolution:
the current behavior is preserved: a cast is added from the input type
to the existing target type.

## How was this patch tested?
- Tests are added to `DeltaTypeWideningAutomaticSuite` to cover type
evolution in INSERT

## This PR introduces the following *user-facing* changes
The table feature is available in testing only, there's no user-facing
changes as of now.

When automatic schema evolution is enabled in INSERT and the source
schema contains a type that is wider than the target schema:

With type widening disabled: the type in the target schema is not
changed. A cast is added to the input to insert to match the expected
target type.

With type widening enabled: the type in the target schema is updated to
the wider source type.
```
-- target: key int, value short
-- source: key int, value int
INSERT INTO target SELECT * FROM source
```
After the INSERT operation, the target schema is `key int, value int`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants