-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix partition columns when they they are not at the end of the schema (fix Removing Partition Columns from RecordBatch) #9276
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @devinjdangelo -- I don't fully understand what problem this PR is fixing. Is this fixing wrong results? Or a panic? Or something else 🤔
Could you possibly add a new test (rather than modify the existing one) so it is clearer?
@@ -53,32 +53,32 @@ select * from validate_partitioned_parquet_bar order by col1; | |||
2 | |||
|
|||
# Copy to directory as partitioned files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to add a new test (not modify an existing test) to show that the existing behavior is not changed
Apologies for not properly documenting the issue up front. I created #9290 to document the incorrect behavior that this PR is fixing. Essentially, ListingTables enforced a particular ordering on the columns which Copy alone does not. The logic for eliminating the partition columns from the I also restored the original test and added a new one instead of modifying. |
Thank you so much -- I am sorry for not being able to see it immediately. I struggle these days to keep up with everthing going on and don't seem to have the ability to research things as much as I would like to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @devinjdangelo -- this looks great
Which issue does this PR close?
closes #9290
Rationale for this change
The current logic for removing partition_by columns assumes that the columns are at the end of the record batch. This is not always true when using COPY with the partition_by option.
What changes are included in this PR?
Removes columns from record batch based on name rather than position.
Are these changes tested?
Yes, modified copy.slt to cover
Are there any user-facing changes?
Yes, partitioned COPY statements will work correctly even if specifying columns that do not come at the end of the schema