Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iceberg $partitions metadata table only uses the current Spec #12323

Closed
alexjo2144 opened this issue May 10, 2022 · 2 comments · Fixed by #12416
Closed

Iceberg $partitions metadata table only uses the current Spec #12323

alexjo2144 opened this issue May 10, 2022 · 2 comments · Fixed by #12416
Assignees
Labels
bug Something isn't working

Comments

@alexjo2144
Copy link
Member

alexjo2144 commented May 10, 2022

The Iceberg table$partitions metadata table only shows partition values in the partition column based on the current PartitionSpec. Instead, the set of all columns which were used an any Spec could be used. This would make it easier to tell old partitions apart.

For example:

trino:default> create table test_table (a int, b int, c int) with (partitioning = array['a']);
trino:default> insert into test_table values (1, 10, 100), (2, 20, 200), (3, 30, 300), (2, 20, 200);

trino:default> select * from "test_table$partitions";
 partition | record_count | file_count | total_size |                                               data                                               
-----------+--------------+------------+------------+--------------------------------------------------------------------------------------------------
 {a=1}     |            1 |          1 |        433 | {b={min=10, max=10, null_count=0, nan_count=NULL}, c={min=100, max=100, null_count=0, nan_count=NULL}} 
 {a=2}     |            2 |          1 |        434 | {b={min=20, max=20, null_count=0, nan_count=NULL}, c={min=200, max=200, null_count=0, nan_count=NULL}} 
 {a=3}     |            1 |          1 |        434 | {b={min=30, max=30, null_count=0, nan_count=NULL}, c={min=300, max=300, null_count=0, nan_count=NULL}} 

spark-sql> alter table default.test_table drop partition field a;
spark-sql> alter table default.test_table add partition field b;;

# This line shows the issue. No rows have the value 1, 2, or 3 for column b
trino:default> select * from "test_table$partitions";
 partition | record_count | file_count | total_size |                                               data                                               

 {b=1}     |            1 |          1 |        470 | {a={min=1, max=1, null_count=0, nan_count=NULL}, c={min=100, max=100, null_count=0, nan_count=NULL}} 
 {b=2}     |            2 |          1 |        474 | {a={min=2, max=2, null_count=0, nan_count=NULL}, c={min=200, max=200, null_count=0, nan_count=NULL}} 
 {b=3}     |            1 |          1 |        474 | {a={min=3, max=3, null_count=0, nan_count=NULL}, c={min=300, max=300, null_count=0, nan_count=NULL}} 


spark-sql> select partition, record_count from default.test_table.files;
{"a":1,"b":null}	1
{"a":2,"b":null}	2
{"a":3,"b":null}	1

Possibly relevant: apache/iceberg#2936

@alexjo2144 alexjo2144 added the bug Something isn't working label May 10, 2022
@findepi
Copy link
Member

findepi commented May 11, 2022

the set of all columns which were used an any Spec could be used. This would make it easier to tell old partitions apart.

we should include only the specs in use, not all defined specs, so that output isn't cluttered unnecessarily

@alexjo2144
Copy link
Member Author

@findepi I've updated the example, hopefully that's clearer that this is a bug. Specifically the b=1, b=2, b=3 partition column values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

Successfully merging a pull request may close this issue.

3 participants