-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot merge to a table with a timestamp column after upgrading delta-rs #2478
Comments
@ion-elgreco would be interested to hear your thoughts on this, seems related to this #2341 |
@echai58 it was quite a breaking change. Old tables were never written correctly. Ideally we cast the table scan to use the delta schema, but we have never added this. You could look into that or just rewrite the tables |
@ion-elgreco Okay, it would be really difficult for us to rewrite all of our tables. Can you provide some pointers to code locations that we'd need to change to get the table scan to use the delta schema? I can look into that. |
Somewhere in the scan builder I guess, and then on the parquet scan |
# Description By casting the read record batch to the delta schema datafusion can read tables where the underlying parquet files can be cast to the desired schema. Fixes: - Errors querying data where some of the parquet files may not have columns that were added later because of schema migration. This includes nested columns for structs that are in Maps, Lists, or children of other structs - maps and lists written with different different element names - timestamps of different units. - Any other cast supported by arrow-cast. This can be done now since data-fusion exposes a SchemaAdapter which can be overwritten. We should note that this makes all times being read by delta-rs as having microsecond precision to match the Delta protocol. # Related Issue(s) - This makes solving #2478 and #2341 just a matter of adding code to delta-rs cast. --------- Co-authored-by: Alex Wilcoxson <alex.wilcoxson@relativity.com>
Environment
Delta-rs version: 0.15.3 => 0.17.2
Binding: python
Bug
What happened:
I have a lot of tables created via delta-rs from an older version, around 0.15.3. I'm looking to upgrade, but I'm running into breaking errors, I think from the 0.16.0 breaking change to how timestamps work.
I have tables that have a
timestamp
column, and due to older versions of delta-rs, they are written without a timezone. When I upgraded to 0.17.2, and I read in the table, the schema saystimestamp[us, tz=UTC]
- okay, that's reasonable, it's attaching a UTC timezone.But, when I try to write to the table via a merge and the predicate compares the timestamp columns, it fails with
My theory is that delta-rs is interpreting the schema of the table as having a UTC timezone, whereas the physical data actually reflects the table having no timezone. Thus, when it casts the source table I pass in, it applies the UTC timezone, which Arrow then cannot compare against the physical table, which has no timezone.
What you expected to happen:
Is there any way to upgrade delta-rs such that this doesn't break? Would be really unfortunate to have to rewrite all of these tables because of this breaking change.
How to reproduce it:
Run this with an older (< 0.16) delta-rs:
Then, run this with a new delta-rs:
The text was updated successfully, but these errors were encountered: