Allow users to define the data type and nullability of a partition column when converting a Parquet table to Delta table #1718

junjunjd · 2023-10-13T01:52:27Z

Description

In the PySpark APIs for convertToDelta command, users can specify the data type of a partition column:

# Convert partitioned parquet table at path 'path/to/table' and partitioned by integer column named 'part'
partitionedDeltaTable = DeltaTable.convertToDelta(spark, "parquet.`path/to/table`", "part int")

It makes sense to support user-defined data type and nullability for partition columns in convert_to_delta. Currently the convert_to_delta function sets the data type of all partition columns as string.

Use Case

Related Issue(s)
#1041, #1682, #1686

The text was updated successfully, but these errors were encountered:

junjunjd · 2023-10-14T08:38:36Z

The convert_to_delta API now takes a vector of SchemaField for partition columns defined by user.

pub async fn convert_to_delta(
    storage: ObjectStoreRef,
    partition_schema: Vec<SchemaField>,
) -> Result<DeltaTable, Error>

The changes are pushed to #1686.

junjunjd added the enhancement New feature or request label Oct 13, 2023

junjunjd mentioned this issue Oct 13, 2023

feat: add convert_to_delta #1686

Merged

junjunjd closed this as completed Nov 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow users to define the data type and nullability of a partition column when converting a Parquet table to Delta table #1718

Allow users to define the data type and nullability of a partition column when converting a Parquet table to Delta table #1718

junjunjd commented Oct 13, 2023 •

edited

Loading

junjunjd commented Oct 14, 2023

Allow users to define the data type and nullability of a partition column when converting a Parquet table to Delta table #1718

Allow users to define the data type and nullability of a partition column when converting a Parquet table to Delta table #1718

Comments

junjunjd commented Oct 13, 2023 • edited Loading

Description

junjunjd commented Oct 14, 2023

junjunjd commented Oct 13, 2023 •

edited

Loading