-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(python): Various Schema
improvements (equality/init dtype checks)
#19379
feat(python): Various Schema
improvements (equality/init dtype checks)
#19379
Conversation
8a5605b
to
da188e1
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #19379 +/- ##
==========================================
+ Coverage 80.18% 80.19% +0.01%
==========================================
Files 1523 1527 +4
Lines 209897 210268 +371
Branches 2434 2440 +6
==========================================
+ Hits 168314 168633 +319
- Misses 41028 41079 +51
- Partials 555 556 +1 ☔ View full report in Codecov by Sentry. |
4843133
to
873d39b
Compare
… improved equality/init dtype checks)
873d39b
to
ea52dce
Compare
I do agree this is valuable, but I am afraid we will add features we cannot support when we move to Rust. I want our schema to be a python wrapper around |
Yup, can remove the However, I think we still need to do the dtype-checks here, as we will always want to pass correct (and instantiated) dtypes into the |
b4895ca
to
db59be1
Compare
Schema
improvements (new base_types
method, improved equality/init dtype checks)Schema
improvements (improved equality/init dtype checks)
Schema
improvements (improved equality/init dtype checks)Schema
improvements (equality/init dtype checks)
A few improvements (arguably fixes) for equality/inequality checks, and dtype init:
We require that a
Schema
is initialised with instantiated dtypes (and make a point of it in the docstring), but don't actually check that we are given such; this PR validates that nested dtypes are given in instantiated form (eg:pl.List(Float64)
, not justpl.List
), as without the inner dtype we don't actually have a validSchema
. I haven't done a full recursive check (eg: walking intoStruct
fields), but can look at it in a second pass.Uninstantiated non-nested types (such as
Int64
that do not take modifying parameters) are allowed, and are automatically instantiated on the way in to guarantee that aSchema
instance is always composed of instantiated dtypes. Those that take units/specifiers (such asDatetime
) require specific user-side instantiation, as above.Fixed equality check between
Schema
and dict objects; similar to the above, we could compare aSchema
with a{name: dtype, …}
dict and get potentially invalid results if the dictionary had uninstantiated dtypes.Added a convenience(update: removed the new method)base_types
method, returning a{name: dtype_class, …}
dict (as returned by the individual dtype equivalentbase_type
method).Optimised internal
Schema
init where the dtypes are already known to be valid (eg: we are initialising from internal state, so we know that we don't have to check them).Drive-by; The "schema_overrides" param in
read_excel
should handle python dtypes (as this param does elsewhere), eg:schema_overrides={"col": float}
.