Type Widening in ALTER TABLE CHANGE COLUMN #2645

johanl-db · 2024-02-16T09:15:38Z

Which Delta project/connector is this regarding?

Description

This change introduces the typeWidening delta table feature, allowing to widen the type of existing columns and fields in a delta table using the ALTER TABLE CHANGE COLUMN TYPE or ALTER TABLE REPLACE COLUMNS commands.

The table feature is introduced as typeWidening-dev during implementation and is available in testing only.

For now, only byte -> short -> int are supported. Other changes will require support in the Spark parquet reader that will be introduced in Spark 4.0

Type widening feature request: #2622
Type Widening protocol RFC: #2624

How was this patch tested?

A new test suite DeltaTypeWideningSuite is created, containing:

DeltaTypeWideningAlterTableTests: Covers applying supported and unsupported type changes on partitioned columns, non-partitioned columns and nested fields
DeltaTypeWideningTableFeatureTests: Covers adding the typeWidening table feature

Does this PR introduce any user-facing changes?

The table feature is available in testing only, there's no user-facing changes as of now.

The type widening table feature will introduce the following changes:

Adding the typeWidening via a table property:

ALTER TABLE t SET TBLPROPERTIES (‘delta.enableTypeWidening' = true)

Apply a widening type change:

ALTER TABLE t CHANGE COLUMN int_col TYPE long

Note: both ALTER TABLE commands reuse the existing syntax for setting a table property and applying a type change, no new SQL syntax is being introduced by this feature.

…er-table

sabir-akhadov

lgtm

bart-samwel

Nice! I left a bunch of test feedback.

bart-samwel · 2024-02-26T12:15:42Z

spark/src/main/scala/org/apache/spark/sql/delta/TypeWidening.scala

+    val isEnabled = DeltaConfigs.ENABLE_TYPE_WIDENING.fromMetaData(metadata)
+    if (isEnabled && !isSupported(protocol)) {
+      throw new IllegalStateException(
+        s"Table property '${DeltaConfigs.ENABLE_TYPE_WIDENING.key}' is " +


I guess this should be using the error framework?

This should never happen unless there's a bug in the implementation, so I wouldn't give it an error class. We typically wouldn't want to document that error as a user-facing error

spark/src/test/scala/org/apache/spark/sql/delta/DeltaTypeWideningSuite.scala

bart-samwel · 2024-02-26T12:20:22Z

spark/src/test/scala/org/apache/spark/sql/delta/DeltaTypeWideningSuite.scala

+
+  test("row group skipping Short -> Int") {
+    withSQLConf(
+      SQLConf.FILES_MAX_PARTITION_BYTES.key -> 1024.toString) {


How do we verify that this actually leads to row group skipping? FWIW it looks like this should lead to regular min/max data skipping on the Delta log. The max partition bytes only affects how we read chunks, i.e., it'll split a parquet file of >1024 bytes over multiple tasks (1024 bytes each), but if the file is still a single row group, then only one of the tasks will actually read it (the task that gets the "middle" byte of the row group), and the other files will simply ignore the row group. And with the current way of writing, each file will have only one row group, so the row group skipping will be equal to the Delta-level file skipping, so it should never skip, really.

This test doesn't make a lot of sense here anymore, I added it back when I found the overflow issue in parquet row group skipping and kept it around but it's redundant with the test I added when I fixed the issue in Spark:
https://github.com/databricks/runtime/blob/a9d6a9d6191264ced1f0658ff6675b0f30e8e77f/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala#L1216

I'm removing it, I don't think it's worth the effort to get it to accurately cover an issue that the Spark test already covers a lot better

tdas · 2024-02-28T19:36:11Z

spark/src/main/scala/org/apache/spark/sql/delta/TableFeature.scala

@@ -625,6 +626,18 @@ object ManagedCommitTableFeature
  }
 }

+object TypeWideningTableFeature extends ReaderWriterFeature(name = "typeWidening-dev")


since we are using -dev right now and its not ready for users using it. is it behind the isTesting flag?

Yes, it is: https://github.com/databricks/runtime/pull/83562/files#diff-7b78502e6bca8772b4197ce6652fdfa72c4211c15b49e3b03ddfef7199cfcf0aR353

tdas

LGTM.

Type Widening in ALTER TABLE CHANGE COLUMN

9a5e7dd

johanl-db force-pushed the type-widening-in-alter-table branch from 872d7fd to 9a5e7dd Compare February 16, 2024 09:29

johanl-db added 3 commits February 19, 2024 08:56

Set up Delta extension in test

73eb480

Fix row group skipping test

95c4fe2

Merge remote-tracking branch 'delta/master' into type-widening-in-alt…

0d0c90b

…er-table

sabir-akhadov approved these changes Feb 22, 2024

View reviewed changes

bart-samwel reviewed Feb 26, 2024

View reviewed changes

johanl-db added 2 commits February 26, 2024 17:39

Address comments

a227fa6

formatting: whitespaces

d462e32

tdas reviewed Feb 28, 2024

View reviewed changes

tdas approved these changes Feb 28, 2024

View reviewed changes

vkorukanti closed this in 9b3fa0a Feb 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Type Widening in ALTER TABLE CHANGE COLUMN #2645

Type Widening in ALTER TABLE CHANGE COLUMN #2645

johanl-db commented Feb 16, 2024

sabir-akhadov left a comment

bart-samwel left a comment

bart-samwel Feb 26, 2024

johanl-db Feb 26, 2024

bart-samwel Feb 26, 2024

johanl-db Feb 26, 2024

tdas Feb 28, 2024

johanl-db Feb 28, 2024

tdas left a comment

Type Widening in ALTER TABLE CHANGE COLUMN #2645

Type Widening in ALTER TABLE CHANGE COLUMN #2645

Conversation

johanl-db commented Feb 16, 2024

Which Delta project/connector is this regarding?

Description

How was this patch tested?

Does this PR introduce any user-facing changes?

sabir-akhadov left a comment

Choose a reason for hiding this comment

bart-samwel left a comment

Choose a reason for hiding this comment

bart-samwel Feb 26, 2024

Choose a reason for hiding this comment

johanl-db Feb 26, 2024

Choose a reason for hiding this comment

bart-samwel Feb 26, 2024

Choose a reason for hiding this comment

johanl-db Feb 26, 2024

Choose a reason for hiding this comment

tdas Feb 28, 2024

Choose a reason for hiding this comment

johanl-db Feb 28, 2024

Choose a reason for hiding this comment

tdas left a comment

Choose a reason for hiding this comment