Iceberg `$partitions` metadata table only uses the current Spec #12323

alexjo2144 · 2022-05-10T16:16:48Z

The Iceberg table$partitions metadata table only shows partition values in the partition column based on the current PartitionSpec. Instead, the set of all columns which were used an any Spec could be used. This would make it easier to tell old partitions apart.

For example:

trino:default> create table test_table (a int, b int, c int) with (partitioning = array['a']);
trino:default> insert into test_table values (1, 10, 100), (2, 20, 200), (3, 30, 300), (2, 20, 200);

trino:default> select * from "test_table$partitions";
 partition | record_count | file_count | total_size |                                               data                                               
-----------+--------------+------------+------------+--------------------------------------------------------------------------------------------------
 {a=1}     |            1 |          1 |        433 | {b={min=10, max=10, null_count=0, nan_count=NULL}, c={min=100, max=100, null_count=0, nan_count=NULL}} 
 {a=2}     |            2 |          1 |        434 | {b={min=20, max=20, null_count=0, nan_count=NULL}, c={min=200, max=200, null_count=0, nan_count=NULL}} 
 {a=3}     |            1 |          1 |        434 | {b={min=30, max=30, null_count=0, nan_count=NULL}, c={min=300, max=300, null_count=0, nan_count=NULL}} 

spark-sql> alter table default.test_table drop partition field a;
spark-sql> alter table default.test_table add partition field b;;

# This line shows the issue. No rows have the value 1, 2, or 3 for column b
trino:default> select * from "test_table$partitions";
 partition | record_count | file_count | total_size |                                               data                                               

 {b=1}     |            1 |          1 |        470 | {a={min=1, max=1, null_count=0, nan_count=NULL}, c={min=100, max=100, null_count=0, nan_count=NULL}} 
 {b=2}     |            2 |          1 |        474 | {a={min=2, max=2, null_count=0, nan_count=NULL}, c={min=200, max=200, null_count=0, nan_count=NULL}} 
 {b=3}     |            1 |          1 |        474 | {a={min=3, max=3, null_count=0, nan_count=NULL}, c={min=300, max=300, null_count=0, nan_count=NULL}} 


spark-sql> select partition, record_count from default.test_table.files;
{"a":1,"b":null}	1
{"a":2,"b":null}	2
{"a":3,"b":null}	1

Possibly relevant: apache/iceberg#2936

The text was updated successfully, but these errors were encountered:

findepi · 2022-05-11T13:15:06Z

the set of all columns which were used an any Spec could be used. This would make it easier to tell old partitions apart.

we should include only the specs in use, not all defined specs, so that output isn't cluttered unnecessarily

alexjo2144 · 2022-05-11T14:48:32Z

@findepi I've updated the example, hopefully that's clearer that this is a bug. Specifically the b=1, b=2, b=3 partition column values.

alexjo2144 added the bug Something isn't working label May 10, 2022

alexjo2144 mentioned this issue May 10, 2022

Support updating Iceberg table partitioning #12259

Merged

ebyhr assigned homar May 16, 2022

homar mentioned this issue May 16, 2022

Handle partition schema evolution in partitions metadata #12416

Merged

findepi closed this as completed in #12416 May 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iceberg `$partitions` metadata table only uses the current Spec #12323

Iceberg `$partitions` metadata table only uses the current Spec #12323

alexjo2144 commented May 10, 2022 •

edited

Loading

findepi commented May 11, 2022

alexjo2144 commented May 11, 2022

Iceberg $partitions metadata table only uses the current Spec #12323

Iceberg $partitions metadata table only uses the current Spec #12323

Comments

alexjo2144 commented May 10, 2022 • edited Loading

findepi commented May 11, 2022

alexjo2144 commented May 11, 2022

Iceberg `$partitions` metadata table only uses the current Spec #12323

Iceberg `$partitions` metadata table only uses the current Spec #12323

alexjo2144 commented May 10, 2022 •

edited

Loading