Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support reading tables with type widening in default engine #335

Conversation

johanl-db
Copy link
Contributor

@johanl-db johanl-db commented Sep 13, 2024

Type widening allows changing the type of an existing column in a Delta table to a wider type.
See https://github.com/delta-io/delta/blob/master/protocol_rfcs/type-widening.md

This change improves the default engine to enable upcasting conversions on read that are required to read tables that have a type change applied: data files written before the widening type change will contain data stored using a narrower type that must be upcast to the wider type in the current table schema.

Changes:

  • Allow conversions:
    • int8 -> int16 -> int32 -> int64
    • float32 -> float64
    • date -> timestamp without timezone
    • integers -> float64
    • integers and decimals -> decimals, as long as the target precision and scale are sufficient to store all values without losing precision

This doesn't cover casting inside nested struct/array/map, which doesn't seem to be properly supported yet.

Testing:

  • Added tests reading a table with various type changes applied, generated using delta spark 4.0 preview.
  • Added unit tests for can_upcast_to_decimal, returning whether a given type can be upcast to a decimal.

@johanl-db johanl-db force-pushed the type-widening-default-parquet-reader branch from 0ee9940 to ec8600e Compare September 13, 2024 14:26
Copy link

codecov bot commented Sep 13, 2024

Codecov Report

Attention: Patch coverage is 98.55072% with 1 line in your changes missing coverage. Please review.

Project coverage is 74.09%. Comparing base (ed1919a) to head (9230d4b).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
kernel/src/engine/arrow_utils.rs 98.46% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #335      +/-   ##
==========================================
+ Coverage   73.86%   74.09%   +0.23%     
==========================================
  Files          43       43              
  Lines        8078     8154      +76     
  Branches     8078     8154      +76     
==========================================
+ Hits         5967     6042      +75     
- Misses       1732     1733       +1     
  Partials      379      379              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely surgical -- just extend the cast compat check and let the existing casting machinery do its job.

kernel/src/engine/arrow_utils.rs Outdated Show resolved Hide resolved
kernel/src/features/mod.rs Show resolved Hide resolved
Copy link
Collaborator

@nicklan nicklan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with one nit. thanks!

kernel/src/engine/arrow_utils.rs Outdated Show resolved Hide resolved
@zachschuermann zachschuermann merged commit 1a66fb9 into delta-incubator:main Sep 19, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants